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Abstract 


We derive mean-unbiased estimators for the structural parameter in instru¬ 
mental variables models with a single endogenous regressor where the sign of one 
or more first stage coefficients is known. In the case with a single instrument, 
there is a unique non-randomized unbiased estimator based on the reduced-form 
and first-stage regression estimates. For cases with multiple instruments we pro¬ 
pose a class of unbiased estimators and show that an estimator within this class 
is efficient when the instruments are strong. We show numerically that unbi¬ 
asedness does not come at a cost of increased dispersion in models with a single 
instrument: in this case the unbiased estimator is less dispersed than the 2SLS 
estimator. Our finite-sample results apply to normal models with known variance 
for the reduced-form errors, and imply analogous results under weak instrument 
asymptotics with an unknown error distribution. 
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1 Introduction 


Researchers often have strong prior beliefs about the sign of the first stage coefficient 
in instrumental variables models, to the point where the sign can reasonably be treated 
as known. This paper shows that knowledge of the sign of the first stage coefficient 
allows us to construct an estimator for the coefficient on the endogenous regressor 
which is unbiased in finite samples when the reduced form errors are normal with 
known variance. When the distribution of the reduced form errors is unknown, our 
results lea d to est im ators t h at are asymptotically unbiased under weak IV sequences as 


defined in 


Sta iger fe Stock (1997). 


As is well known, the conventional two-stage least squares (2SLS) estimator may 
be severely biased in overidentified models with weak instruments. Indeed the most 


common pretest for weak instruments, the 


Staiger & Stock (119971) rule of thumb which 


declares the instruments weak when the first stage F statistic is less than 10, is shown 
in [Stock fe Yogo (2005) to correspond to a test for the worst-case bias in 2SLS relative 


to OLS. While the 2SLS estimator performs better in the just-identified case accord¬ 


ing to some measures of central tendency, in this case it has no first moment^ A 
number of p apers have proposed alter n ative estimators to re duce particular measure s 


of bias, e.g. 


Angrist fe Kruegerl (119951). 


Acker berg V Devere uxj (120091 ). and 


Imbens et al 


(1999), Donald & Newey ( 2001 ). 


Harding et ahl (120151) . but none of the resulting fea¬ 


sible estimators is un biased eit 
totics. Indeed, Hirano & Port er 


rer in finite samples or under weak instrument asymp- 
( 20151 ) show that mean, median, and quantile unbiased 


estimation are all impossible in the linear IV model with an unrestricted parameter 
space for the first stage. 

We show that by exploiting information about the sign of the first stage we can 
circumvent this impossibility result and construct an unbiased estimator. Moreover, 
the resulting estimators have a number of properties which make them appealing for 


Hf we instead consider median bias, 2SLS exhibits median bias when the instruments are weak, 
though this bias decreases rapidly with the strength of the instruments. 
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applications. In models with a single instrumental variable, which include many em¬ 
pirical applications, we show that there is a unique unbiased estimator based on the 
reduced-form and first-stage regression estimates. Moreover, we show that this esti¬ 
mator is substantially less dispersed that the usual 2SLS estimator in finite samples. 
Under standard (“strong instrument”) asymptotics, the unbiased estimator has the same 
asymptotic distribution as 2SLS, and so is asymptotically efficient in the usual sense. 
In over-identified models many unbiased estimators exist, and we propose unbiased es¬ 
timators which are asymptotically efficient when the instruments are strong. Further, 
we show that in over-identified models we can construct unbiased estimators which are 
robust to small violations of the first stage sign restriction. We also derive a lower 
bound on the risk of unbiased estimators in finite samples, and show that this bound 
is attained in some models. 

In contrast to much of the recent weak instruments literature, the focus of this 
paper is on estimation rather than hypothesis testing or confidence set construction. 
Our appr oach i s closely r elate d to the classical theory of optimal point estimation 


(see e.g. 


Lehmann fe Casellal (119981 )1 in that we seek estimators which perform well 


according to conventional estimation criteria (e.g. risk with respect to a convex loss 
function) within the class of unbiased estimators. As we note in Section 12.41 below 
it is straightforward to use results from the weak instruments literature to construct 
identification-robust tests and confidence sets based on our estimators. As we also 
note in that section, however, optimal estimation and testing are distinct problems in 
models with weak instruments and it is not in general the case that optimal estimators 
correspond to optimal confidence sets or vice versa. Given the important role played 
by both estimation and confidence set construction in empirical practice, our results 
therefore complement the literature on identification-robust testing. 

The rest of this section discusses the assumption of known first stage sign, introduces 
the setting and notation, and briefly reviews the related literature. Section [2] introduces 
the unbiased estimator for models with a single excluded instrument. Section [3] treats 
models with multiple instruments and introduces unbiased estimators which are robust 
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to small violations of the first stage sign restriction. Section [4] presents simulation 


results on the performance of our unbi a sed es timat ors. Section |5l discu s ses i 


applications using data from 


Hornung 12 0141) and lAngrist fe Krueger! (199lJ). Proofs 


and auxiliary results are given in a separate appendix 


lustrative 


1.1 Knowledge of the First-Stage Sign 

The results in this paper rely on knowledge of the first stage sign. This is reasonable 
in m any econo mic contex ts. In their study of schooling and earnings, for instance, 
Angrist fe Kruegerl (11991 1 note that compulsory schooling laws in the United States 
allow those born earlier in the year to drop out after completing fewer years of school 
than those born later in the year. Arguing that quarter of birth can reasonably be 
excluded from a wage equation, they use this fact to motivate quarter of birth as an 
instrument for schooling. In this context, a sign restr iction on the first stage amounts to 
an assumption that the mechanism claimed by lAngrist fe Krueger works in the expected 
direction: those born earlier in the year tend to drop out earlier. More generally, 
empirical researchers often have some mechanism in mind for why a model is identified 
at all (i.e. why the first stage coefficient is nonzero) that leads to a known sign for the 
direction of this mechanism (i.e. the sign of the first stage coefficient). 

In settings with heterogeneous treatment effects, a first stage monotoni city assu mp¬ 


tion i s often used to interp ret instrumental varia 


1994, 


Heckman et al 


20061 ). In the language of 


Dies estimates (see 


Imb en s fe Angrist (119941 ). the mono- 


mbens & Angrist 


tonicity assumption requires that either the entire population affected by the treatment 
be composed of “compliers,” or that the entire population affected by the treatment be 
composed of “defiers.” Once this assumption is made, our assumption that the sign of 
the first stage coefficient is known amounts to assuming the researcher knows which of 
these possibilities (compliers or defiers) holds. Indeed, in the examples where they argue 
that monotonicity is plausible (involving draft lottery numbers in one case and inten- 


2 The appendix is available online at https://sites.google.com/site/isaiahandrews/working-papers 
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tion to treat in another), llmbens & Angristl (199J) argue that all individuals affected 


by the treatment are “compilers” for a certain definition of the instrument. 

It is important to note, however, that knowledge of the first stage sign is not always a 
reasonable assumption, and thus that the results of this paper are not always applicable. 
In settings where the instrumental variables are indicators for groups without a natural 
ordering, for instance, one typically does not h ave pr ior inform ation about signs of the 


first stage coefficients. To give one example, 


Aizer & Dovlc Jr. 


(1201511 use the fact that 


judges are randomly assigned to study the effects of prison sentences on recidivism. 
In this setting, knowledge of the first stage sign would require knowing a priori which 
judges are more strict. 


1.2 Setting 

For the remainder of the paper, we suppose that we observe a sample of T observations 
(Y t ,X t , Z' t ), t = 1,...,T where Yj is an outcome variable, X t is a scalar endogenous 
regressor, and Z t is a k x 1 vector of instruments. Let Y and X be T x 1 vectors with 
row t equal to Y t and X t respectively, and let Z be a T x k matrix with row t equal to 
Z\. The usual linear IV model, written in reduced-form, is 


( 1 ) 


Y = Ztt/3 + U 
X = Zn + V 

To derive finite-sample results, we treat the instruments Z as fixed and assume that 
the errors ( U , V ) are jointly normal with mean zero and k nown variance-covari ance 


matrix Var ((V', V')') 0 As is standard (see, for example, D. 


Andrews et al. 


(1200611 1. in 


contexts with additional exogenous regressors W (for example an intercept), we define 


3 Following the weak instruments literature we focus on models with homogeneous /?, which rules 
out heterogeneous treatment effect models with multiple instruments. In models with treatment effect 
heterogeneity and a single instrument, however, our results immediately imply an unbiased estimator 
of the local average treatment effect. In models with multiple instruments, on the other hand, one can 
use our results to construct unbiased estimators for linear combinations of the local average treatment 
effects on different instruments. (Since the endogenous variable X is typically a binary treatment in 


5 












Y, X, Z as the residuals after projecting out these exogenous regressors. If we denote 
the reduced-form and first-stage regression coefficients by £1 and £ 2 , respectively, we 
can see that 



(. Z'Z )~ 1 Z'Y 

(z'zy 1 z'x 



(2) 


for 


£ = 



= (l 2 <g> ( Z'ZY 1 Z') Var (( U ', V')') (l 2 <g> (Z'Z)“ 1 Z')'. (3) 


We as sume throughout that £ is positive definite. Following the literature (e.g. 


Moreira fe Moreira 


20131) . we consider estimation based solely on (^ 1 ,^ 2 ), which are sufficient for (ir,/3) in 
the special case where the errors {U t) V t ) are iicl over t. All uniqueness and efficiency 
statements therefore restrict attention to the class of procedures which depend on the 
data though only these statistics. The conventional generalized method of moments 
(GMM) estimators belong to this class, so this restriction still allows efficient estima¬ 
tion under strong instruments. We assume that the sign of each component 7 q of 7r is 
known, and in particular assume that the parameter space for ( 7 r,/3) is 


0 = j(7r, (3): it e n c ( 0 , oo ) fc ,/3 e B 


(4) 


for some sets II and B. Note that once we take the sign of 7 q to be known, assuming 
7Tj > 0 is without loss of generality since this can always be ensured by redefining Z. 

In this paper we focus on models with fixed instruments, normal errors, and known 
error covariance, which allows us to obtain finite-sample results. As usual, these finite- 
sample results will imply asymptotic results under mild regularity conditions. Even in 
models with random instruments, non-normal errors, serial correlation, heteroskedastic- 
ity, clustering, or any combination of these, the reduced-form and first stage estimators 
will be jointly asymptotically normal with consistently estimable covariance matrix 

such models, this discussion applies primarily to asymptotic unbiasedness as considered in Appendix 
[Bl rather than the finite sample model where X and Y are jointly normal.) 
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£ under mild regularity conditions. Consequently, the finite-sample results we develop 
here will imply asymptotic results under both weak and strong instrument asymptotics, 
where we simply define (£ 1 , £ 2 ) as above and replace £ by an estimator for the variance 
of £ to obtain feasible statistics. Appendix [B] provides the details of these results^ In 
the main text, we focus on what we view as the most novel component of the paper: 
finite-sample mean-unbiased estimation of /? in the normal problem ()2l) . 


1.3 Related Literature 


Our unbiased IV estimators bui ld on results 
a normal mean discussed in Voinov fe Nikulin 


'or unbiased estimation of the inverse of 


(1199311 . More broadly, the literature has 


Voinov & Nikulin 

for details and references. Recent work by 

Mueller & Wane 

(2015) 


develops a numerical approach for approximating optimal nearly unbiased estimators 
in variety of nonstandard settings, though they do not consider the linear IV model. 
To our knowledg e the only other pa per to treat finite sample mean-unbiased estimation 


in IV models is 


Hi rano fe Ported (1201511 . who find that unbiased estimators do not 


exist when the parameter space is unrestricted. In our setting, the sign restriction on 
th e first-stage coeffic ient leads to a parameter space that violates the assumptions of 
Hirano & Porter (2 015 1. so that the negative results in that paper do not applyjf] The 
nonexistence of un biased es tim ator s has been noted in other nonstandard econometric 
contexts by Hirano & Porter (2012). 


The broader literature on the finite sample properties of IV estimators is huge: see 


4 The feasible analogs of the finite-sample unbiased estimators discussed here are asymptotically 
unbiased in general models in the sense of converging in distribution to random variables with mean f3. 
Note that this does not imply convergence of the mean of the feasible estimators to /3, since convergence 
in distribution does not suffice for convergence of moments. Our estimator is thus asymptotically unbi¬ 
ased under weak and strong instruments in the same sense that LIML and just-identified 2SLS, which 

do not in general have finite-sample moments, are asymptotica lly unbias e d und e r stro ng instruments. 
5 In particular, the sign restriction violates Assumption 2.4 oflH irano fe Porten (|2015l f. and so renders 

the negative result in Theorem 2.5 of that paper inapplicable. See Appendix [Cl for details. 
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Phillips (119831 ) and Hillicr ( 20061 ) for references. While this literature does not study 


unbiased estimation in finite samples, there has been substantial research on higher 
order asymptotic bias p roperties: s ee t he references given in the first section of the 


introduction, as well as 


Hahn et al. 


(20 04 ) and the references therein. 


Our interest in finite sample results for a normal model with known reduced form 
variance is motivated by the weak IV lit eratu re, where this model arises asymptotically 
un der weak IV sequ ences as in lStaiger fe Stockl (1199711 (see also Appendix[B]). In contrast 


to 


Staiger fe Stockl . ho wever, our r e sults allow for heteroskedastic, clustered, or serially 


correlated errors as in 


Klei bergenl (120071 ). The primary focus of recent work o n weak 


Andrews 


instruments has, however, been on inference rather than estimation. See 
(2 01411 for additional references. 

Sign restrictions have been used in other settings in the econometrics literature, 
although the focus is often on inference or on using sign restrictio ns to im proye pop - 


ulation bounds, rather than estimation. Recent examples include 


Moon et al. 


(12013 ) 


and several papers cited therein, which use sign restrictions to partially identify vector 


autoregre ssion model 


Andrews (2001) and 


s. Inference for sign restricted parameters has been treated by D. 


Gourieroux et al 


(11982 ). among others. 


2 Unbiased Estimation with a Single Instrument 

To introduce our unbiased estimators, we first focus on the just-identified model with a 
single instrument, k = 1. We show that unbiased estimation of /3 in this context is linked 
to unbiased estimation of the inverse of a normal mean. Using this fact we construct an 
unbiased estimator for (3, show that it is unique, and discuss some of its finite-sample 
properties. We note the key role played by the first stage sign restriction, and show 
that our estimator is equivalent to 2SLS (and thus efficient) when the instruments are 
strong. 


































Iii the just-identified context £1 and £2 are scalars and we write 



The problem of estimating 0 therefore reduces to that of estimating 

o.*P ,^Ei] 

' vr E [6] ‘ 


( 5 ) 


The conventional IV estimate B 2 SLS = is the natural sample-analog of (J5]). As is 
well-known, however, this estimator has no integer moments. This lack of unbiasedness 
reflects the fact that the expectation of the ratio of two random variables is not in 
general equal to the ratio of their expectations. 

The form of (J5]) nonetheless suggests an approach to deriving an unbiased estimator. 
Suppose we can construct an estimator fi which (a) is unbiased for 1 /7T and (b) depends 
on the data only through £ 2 - If we then define 


«K,s)= 


(Tn 


( 6 ) 


Elf] E 5 = B — , and fd + will be an unbiased estimator of B. Thus, the 

L J [ J °2 °2 

problem of unbiased estimation of /? reduces to that of unbiased estimation of the 
inverse of a normal mean. 




n 

r - 

we have that E 

6 

= 7r B ~ and S is independent of rjj Thus, E 

a 2 

tS 


2.1 Unbiased Estimation of the Inverse of a Normal Mean 


A result from Voinov &; Nikulin ( 19931 ) shows that unbiased estimation of 1 /7r is possible 
if we assume its sign is known. Let $ and 0 denote the standard normal cdf and pdf 
respectively. 


Lemma 2.1. Define 


( 6 , 02) = 


1 1-$ (6/^2) 
02 0 (6/02) 


6 Note that the orthogonalization used to construct S is similar to that used bv lKleiber genl (j2002f ). 
Moreiral (120031 ). and the subsequent weak-IV literature to construct identification-robust tests. 
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For all 7r > 0, E n [t (f 2 , erf)] = 


The derivation of r (£ 2 , cr|) in Voinov &: Nikul in (119931) relies on the theory of bilat¬ 
eral Laplace transforms, and offers little by way of intuition. Verifying unbiasedness is 
a straightforward calculus exercise, however: for the interested reader, we work through 
the necessary derivations in the proof of Lemma 12.11 

From the formula for f, we can see that this estimator has two properties which are 
arguably desirable for a restricted estimate of 1 /it. First, it is positive by definition, 
thereby incorporating the restriction that n > 0. Second, in the case where positivity 
of 7r is obvious from the data (£ 2 is very large relative to its standard deviation), it 
is close to the natural plug-in estimator l/£ 2 . The second property is an immediate 
consequence of a well-known approximation to the tail of the normal cdf, which is used 
extensively in the literature on extr eme value limit theorems for normal sequences and 
processes (see Equation 1.5.4 in 


Leadbetter et al. 


1983 


and the remainder of that book 


for applications). We discuss this further in Section [27 


2.2 Unbiased Estimation of (3 

Given an unbiased estimator of 1/tt which depends only on £ 2 , we can construct an 
unbiased estimator of /? as suggested above. Moreover, this estimator is unique. 


Theorem 2.1. Define 


Pu (£,E)=f(6,a 2 2 )h(£,E) + 


1 1 — < i > ($2/<72) 


V2 /rr 2 ) 


6-^6 + 


Q T2 
~°2 
cri2 

~°2~’ 


The estimator fiu (£, £) is unbiased for ft provided n > 0. 

Moreover, if the parameter space 0 contains an open set then fiu (£, £) is the unique 
non-randomized unbiased estimator for fi, in the sense that any other estimator fi (£, £) 
satisfying 


- J 7T,f3 


£(£,£) 


p Vvr G n,/3 G B 
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also satisfies 


/3 (£,£)= A, (£,£) a.s. Vtt eU,f3eB. 


Note that the conventional IV estimator can be written as 

6 6 V ^2 / ^2 

Thus, /3jj differs from the conventional IV estimator only in that it repla ces t he plug -in 

estimate 1/^2 for 1/n by the unbiased estimate f. From results in e.g. Bariczj (120081) . 

we have that f < l/£ 2 for > 0, so when £ 2 is positive fin shrinks the conventional 

IV estimator towards cr 12 /<rf[] By contrast, when £ 2 < 0, fiu lies on the opposite 

side of (Ji 2 /ct| from the conventional IV estimator. Interestingly, one can show that 

the unbiased estimator is uniformly more likely to correctly sign /3 — than is the 

°2 

conventional estimator, in the sense that for <p(x) = l{x > 0}, 


on 


( Pu ~ ) = ^ (Z 3 “ “T ) P U/3 


On 


with strict inequality at some points 


intslfl 


v(fe is -hr) 


2.3 Risk and Moments of the Unbiased Estimator 

The uniqueness of j3u among nonrandomized estimators implies that j3jj minimizes the 
risk En ; p£ ^/3(£, S) — /3^j uniformly over 7r,/3 and over the class of unbiased estimators 
for any loss function t such that randomization cannot reduce risk. In particular, 
by Jensen’s inequality flu is uniformly minimum risk for any convex loss function l. 
This includes absolute value loss as well as squared error loss or L p loss for any p > 1. 
However, elementary calculations show that \@u\ has an infinite pth moment for p > 1. 
Thus the fact that $u has uniformly minimal risk implies that any unbiased estimator 
must have an infinite pth moment for any p > 1. In particular, while /3jj is the uniform 


7 Under weak instrument asymptotics as in Staiger & Stock (1997) and homoskedastic errors, < 712 / o\ 
is the probability limit of the OLS estimator, though this does not in general hold under weaker 

assumptions on the error structure. 

8 This property is far from unique to the unbiased estimator, however. 
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minimum mean absolute deviation unbiased estimator of /3, it is minimum variance 
unbiased only in the sense that all unbiased estimators have infinite variance. We 
record this result in the following theorem. 

Theorem 2.2. Fore > 0, the expectation of |/ 3 [/(£, £)| 1+e is infinite for all tt , f3 . More¬ 
over, if the parameter space 0 contains an open set then any unbiased estimator of fi 
has an infinite 1 + e moment. 


2.4 Relation to Tests and Confidence Sets 


As we show in the next subsection, fiu is asymptotically equivalent to 2SLS when the 
instruments are strong and so can be used together with conventional standa rd errors 


i n tha t case. Even when the instruments are weak the conditioning approach of 


Morcira 


(1200311 yields valid conditional critical values for arbitrary test statistics and so can be 
used to construct conditional t-tests based on fiu which control size. We note, however, 
that optimal estimation and optimal testing are distinct questions in the context of weak 
IV (e.g. whil e $tt is uniform ly minimum risk unbiased for convex loss, it follows from 


the results of 


More iral (j2Q09h that the Anderson-Rubin test, rather than a conditional t- 


test based on fiu, is the uniformly most powerful unbiased two-sided test in the present 
just-identified context^ Since our focus in this paper is on estimation we do not 
further pursue the question of optimal testing in this paper. However, properties of 
tests based on unbiased estimators, particularly in contexts where the Anderson-Rubin 
test is not uniformly most powerful unbiased (such as one-sided testing and testing in 
the over identified model of Section [3]), is an interesting topic for future work]^ 


Moreiral (j2009f ) establishes this result in the model without a sign restriction, and it is straightfor¬ 


ward to show that the result continues to hold in the sign-restricted model. 

10 Absent such results, we suggest reporting the Anderson-Rubin confidence set to accompany the 

unbiased point estimate. As discussed in Section IF731 the 95% Anderson-Rubin confidence set contains 

/3jj with probability exceeding 97%, and with probability near 100% except when tt is extremely small. 
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2.5 Behavior of / 3 jj When 7 r is Large 


While the finite-sample unbiasedness of fiu is appealing, it is also natural to consider 
performance when the instruments are highly informative. This situation, which we 
will model by taking 7 r to be large, corresponds to the conventional strong-instrument 
asymptotics where one fixes the data generating process and takes the sample size to 
infinity!!!. 

As we discussed above, the unbiased and conventional IV estimators differ only in 
that the former substitutes f (£ 2 ,crf) for l/£ 2 . These two estimators for 1 /7r coincide 


to a high order of approximation for large values of £ 2 . Specifically, as noted in 
(.2010) (Section 2.3.4), for ( 2 > 0 we have 

1 


Small 


0~2 


T (6W2) - 


6 


< 

<4 


e 2 


Thus, since £ 2 A 00 as 7r —* 00 , the difference between t (SwI) and 1/^2 converges 
rapidly to zero (in probability) as 7r grows. Consequently, the unbiased estimator fiu 
(appropriately normalized) has the same limiting distribution as the conventional IV 
estimator $ 2 Sls as we take 7r —> 00 . 


Theorem 2.3. As n —> 00 , holding /3 and S fixed, 

7T [fiu — @2SLS^j 0. 

Consequently, fiu A ft and 

7T [fiu ~ fty 4- N (0, al - 2f3(ji2 + /3 2 ct 2 ) . 

11 Formally, in the finite-sample normal IV model (fljl. strong-instrument asymptotics will correspond 

to fixing 7r and taking T —> oo, which under mild conditions on Z and Var ((C * * S * 7 , V , ) / ) will result in 

S —y 0 in 0- However, it is straightforward to show that the behavior of f3u, P 2 sls, and many other 
estimators in this case will be the same as the behavior obtained by holding E fixed and taking 7r to 

infinity. We focus on the latter case here to simplify the exposition. See Appendix [Bj which provides 

asymptotic results with an unknown error distribution, for asymptotic results under T —> 00 . 


13 












Thus, the unbiased estimator flu behaves as the standard IV estimator for large 
values of tt. Consequently, one can show that using this estimator along with conven¬ 
tional standard errors will yield asymptotically valid inference under strong-instrument 
asymptotics. See Appendix [B] for details. 


3 Unbiased Estimation with Multiple Instruments 


We now consider the case with multiple instruments, where the model is given by (P) 
and d2J) with k (the dimension of Z t , 7r, £1 and £ 2 ) greater than 1. As in Section 11.21 
we assume that the sign of each element 7q of the first stage vector is known, and we 
normalize this sign to be positive, giving the parameter space ([4]). 

Using the results in Section[2]one can construct an unbiased estimator for fl in many 
different ways. For any index i E {1,..., k}, the unbiased estimator based on (^j, £ 2 ,«) 
will, of course, still be unbiased for fl when k > 1. One can also take non-random 
weighted averages of the unbiased estimators based on different instruments. Using 
the unbiased estimator based on a fixed linear combination of instruments is another 
possibility, so long as the linear combination preserves the sign restriction. However, 
such approaches will not adapt to information from the data about the relative strength 
of instruments and so will typically be inefficient when the instruments are strong. 

By contrast, the usual 2SLS estimator achieves asymptotic efficiency in the strongly 
identified case (modeled here by taking ||7r|| —y oo) when errors are homoskedastic. In 
fact, in this case 2SLS is asymptotically equivalent to an infeasible estimator that uses 
knowledge of tt to choose the optimal combination of instruments. Thus, a reasonable 


goal is to construct an estimator that (1 


is unbiased for fixed tt and (2) is asymp¬ 


totically equivalent to 2SLS as ||7r|| —> ooo In the remainder of this section we first 


12 In the heteroskedastic case, the 2SLS estimator will no longer be asymptotically efficient, and 
a two-step GMM estimator can be used to achieve the efficiency bound. Because it leads to simpler 
exposition, and because the 2SLS estimator is common in practice, we consider asymptotic equivalence 
with 2SLS, rather than asymptotic efficiency in the heteroskedastic case, as our goal. As discussed in 
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introduce a class of unbiased estimators and then show that a (feasible) estimator in 
this class attains the desired strong IV efficiency property. Further, we show that in 
the over-identified case it is possible to construct unbiased estimators which are robust 
to small violations of the first stage sign restriction. Finally, we derive bounds on the 
attainable risk of any estimator for finite ||7r|| and show that, while the unbiased estima¬ 
tors described above achieve optimality in an asymptotic sense as ||7r|| —y oo regardless 
of the direction of tt, the optimal unbiased estimator for finite n will depend on the 
direction of n. 

3.1 A Class of Unbiased Estimators 

Let 

m = ( 6,4 ) and E(i) = ( Sll,ii El2,ii 'j 

\ £2 ,i ) \ £21,ii S 22 ,ii J 

be the reduced form and first stage coefficients on the ?'th instrument and their variance 
matrix, respectively, so that A/(£(i), £(*)) is the unbiased estimator based on the itli 
instrument. Given a weight vector w G with Y2t=\ w % — 1, let 

k 

Pw(£,E;w) = ^wJu(£(i),E(i)). 

i=1 

Clearly, j3 w is unbiased so long as w is nonrandom. Allowing w to depend on the data 
£, however, may introduce bias through the dependence between the weights and the 
estimators A/(£(i), £(*)). 

To avoid this bias we first consider a randomized unbiased estimator and then take 
its conditional expectation given the sufficient statistic £ to eliminate the randomization. 
Let £ ~ iV(0, £) be independent of £, and let = £ + £ and = £ — £. Then £^ 
and ^ are (unconditionally) independent draws with the same marginal distribution 
as £, save that £ is replaced by 2£. If T is even, Z'Z is the same across the first and 

Appendix IA.21 however, our approach generalizes directly to efficient estimators in non-homoskedastic 
settings. 
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second halves of the sample, and the errors are iid, then ^ and have the same 
joint distribution as the reduced form estimators based on the first and second half of 
the sample. Thus, we can think of these as split-sample reduced-form estimates. 

Let w = w(£^) be a vector of data dependent weights with '£2i=i'&i = 1- By the 
independence of and 


2S; tf« <4) ))l =J2 E M« w )] E 2E(i))l = P- (7) 


2 — 1 


To eliminate the noise introduced by £, define the “Rao-Blackwellized” estimator 


Prb = Prb{£, V w) = E f / i (£ (a) , 2 S; w (^)) 


This gives a class of unbiased estimators, where the estimator depends on the choice 
of the weight w. Unbiasedness of (3rb follows immediately from ([71) and the law of 
iterated expectations. While (3rb does not, to our knowledge, have a simple closed 
form, it can be computed by integrating over the distribution of Q. This can easily be 
done by simulation, taking the sample average of f3 w over simulated draws of and 
^ b) while holding £ at its observed value. 


3.2 Equivalence with 2SLS under Strong IV Asymptotics 

We now propose a set of weights w which yield an unbiased estimator asymptotically 
equivalent to 2SLS. To motivate these weights, note that for W = Z'Z and e* the zth 
standard basis vector, the 2SLS estimator can be written as 


02 sls — 




E 

2 — 1 


&W£ 2 6 / 


which is the GMM estimator with weight matrix W = Z'Z. Thus, the 2SLS estimator 

is a weighted average of the 2SLS estimates based on single instruments, where the 

weight for estimate £i,j/£ 2 ,t based on instrument i is equal to 7 2 . This suggests 

the unbiased Rao-Blackwellized estimator with weights wUE^) = — , 1 ! c ' : 

* v ' w&> 


Frb = AbK.E;*) = E /§„«<“>,2E;«i*(f (l,) )) 


( 8 ) 
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The following theorem shows that j3 RB is asymptotically equivalent to fosLS hi the 
strongly identified case, and is therefore asymptotically efficient if the errors are iid. 

Theorem 3.1. Let ||7r|| —» oo with ||7r||/min, ; = 0(1). Then ||vr||(/3^, B — & 2 Sls) A 0. 

The condition that ||vr||/ min* 7 q = 0(1) amounts to an assumption that the “strength” 
of all instruments is of the same order. As discussed below in Section [3731 this assump¬ 
tion can be relaxed by redefining the instruments. 

To understand why Theorem 13.11 holds, consider the “oracle” weights w* = ' h.'.'.A -. 
It is easy to see that w* — w* A 0 as || 7 r|| —> oo. Consider the oracle unbiased esti¬ 
mator f3 RB = $rb(£,E-w*), and the oracle combination of individual 2SLS estimators 
02 SLS = Y2i=i w i It 1 - arguments similar to those used to show that statistical noise in 
the first stage estimates does not affect the 2SLS asymptotic distribution under strong 
instrument asymptotics, it can be seen that IMI (/Asls — As'ls) A 0 as || 7 r|| — > oo. 
Further, one can show that /3 RB = j3 w (£,,Y,-,w*) = Since this 

is just j3% SLS with $7 (£(*), £(*)) replacing it follows by Theorem 12.31 that 

h\\{P°RB-&SLs) ^ Theorem 13.II then follows by showing that || 7 r|| 0rb — Prb) 0, 
which follows for essentially the same reasons that first stage noise does not affect the 
asymptotic distribution of the 2SLS estimator but requires some additional argument. 
We refer the interested reader to the proof of Theorem 13.11 in Appendix [A] for details. 

3.3 Robust Unbiased Estimation 

So far, all the unbiased estimators we have discussed required 7 r, > 0 for all i. Even 
when the first stage sign is dictated by theory, however, we may be concerned that 
this restriction may fail to hold exactly in a given empirical context. To address such 
concerns, in this section we show that in over-identified models we can construct es¬ 
timators which are robust to small violations of the sign restriction. Our approach 
has the further benefit of ensuring asymptotic efficiency when, while ||7r|| —y oo, the 
elements i r, may increase at different rates. 
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Let M be a k x k invertible matrix such that all elements are strictly positive, and 


| = (J 2 ® M)f, E = (J 2 (8) M)E(/ 2 <8) M)', IL = M~ l 'WM~ l . 


The GMM estimator based on £ and IF is numerically equivalent to the GMM estimator 
based on £ and PL. In particular, for many choices of W, including all those discussed 
above, estimation based on (£, W, E) is equivalent to estimation based on instruments 
ZM~ l rather than Z. 

Note that for tt = Mtt, £ is normally distributed with mean (tt'/3,tt')' an d variance 
E. Thus, if we construct the estimator j3* RB from (£, W, E) instead of (£, PL, E), we 
obtain an unbiased estimator provided 7q > 0 for all i. Since all elements of M are 
strictly positive this is a strictly weaker condition than 7q > 0 for all i. By Theorem 
13.11 / 3 RB constructed from from £ and W will be asymptotically efficient as ||7r|| — > oo 
so long as 7f = Mn is nonnegative and satisfies ||7f ||/ miry n t = 0(1). Note, however, 
that 


min7Tj > (min My) 117T 

* i,3 


(min My) 
i,3 



TT 


> (min My) 



7T 


so ||7r||/ rniiij it,; = 0(1) now follows automatically from ||7r|| —» oo. 

Conducting estimation based on ^ and W offers a number of advantages for many 
different choices of M. One natural class of transformations M is 


M 


1 c c ■ ■ ■ c 

c 1 c ■■ ■ c 

c c 1 ■■ ■ c 


Diag(Z 22 ) % 


c c c 


1 


(9) 


for c G [0,1) and Diag(T, 22 ) the matrix with the same diagonal as S 22 and zeros 
elsewhere. For a given c, denote the estimator j3* RB based on the corresponding (£, W, E) 
by j8 rb c . One can show that j3* RB 0 = j3* RB based on (£, W, E), and going forward we let 
Prb denote 
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We can interpret c as specifying a level of robustness to violations on the sign 
restriction for 7 r*. In particular, for a given choice of c, 7T will satisfy the sign restriction 
provided that for each i, 


-vr. 


/ \J^22,ii < C ■ 7Tj-/ a/S 2 2 ,jj, 


that is, provided the expected z-statistic for testing that each wrong-signed 7q is equal 
to zero is less than c times the sum of the expected z-statistics for j ^ i. Larger 
values of c provide a greater degree of robustness to violations of the sign restriction, 
while all choices of c G (0,1) yield asymptotically equivalent estimators as ||7r|| —y 00 . 
For finite values of 7r however, different choices of c yield diffe rent est imators, so we 
explore the effects of different choices below using the Angr ist & Krueger (1199 1) dataset. 
Determining the optimal choice of c for finite values of n is an interesting topic for future 
research. 


3.4 Bounds on the Attainable Risk 

While the class of estimators given above has the desirable property of asymptotic 
efficiency as ||7r|| —> 00 , it is useful to have a benchmark for the performance for finite 
7 r. In Appendix [Dj we derive a lower bound for the risk of any unbiased estimator at 
a given 7 t*,/3*. The bound is based on the risk in a submodel with a single instrument 
and, as in the single instrument case, shows that any unbiased estimator must have an 
infinite 1 + £ absolute moment for e > 0. In certain cases, which include large parts of 
the parameter space under homoskedastic errors ( Ut,Vt ), the bound can be attained. 
The estimator that attains the bound turns out to depend on the value 7r*, which shows 
that no uniform minimum risk unbiased estimator exists. See Appendix [D] for details. 


4 Simulations 

In this section we present simulation results on the performance of our unbiased esti¬ 
mators. We study the model with normal errors and known reduced-form variance. We 
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first consider models with a single instrument and then turn to over-identified models. 
Since the parameter space in the single-instrument model is small, we are able to obtain 
comprehensive simulation results in this case, studying performance over a wide range 
of parameter values. In the over-identified case, by contrast, the parameter space is too 
large t o comprehensive l y exp lore by simulation so we instead calibrate our sim ulations 


to the 


Staiger fe Stockl (1199711 specifications for the lAngrist fe Kruegerl (119911) dataset. 


4.1 Performance with a Single Instrument 

The estimator flu based on a single instrument plays a central role in all of our results, so 
in this section we examine the performance of this estimator in simulation. For purposes 
of comparison we also discuss results for the two-stage least squares estimator fosLS- 
The lack of moments for @ 2 Sls hi the just-identified context renders some compa risons 


with f3jj infeasible, however, so we also consider the performance of the 
estimator with constant one, 

n _ &£l + CTl2 

Pfull r 2 


Fuller (119771 ) 


which we define as in 


Mills et al. 


n 

(2014]) o Note that in the just-identified case con¬ 


sidered here Pfull also coincides with the bias-corrected 2SLS estimator (again, see 
Mills et ah). 

While the model (J2]) has five parameters in the single-instrument case, (/3,7r, af, a 12 , <r$), 
an equivariance argument implies that for our purposes it suffices to fix /I = 0, cf\ = 

02 = 1 and consider the parameter space ( 7 r, au) G (0, 00 ) x [0,1). See Appendix [E] for 
details. Since this parameter space is just two-dimensional, we can fully explore it via 
simulation. 


13 In the case where Ut and Vt are correlated or heterosk edastic acro s s the definition of Pfull 


above is the natural extension of the definition considered in Mills et al 


( 201411 . 
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4.1.1 Estimator Location 


We first compare the bias of (3u and (3full (we omit /3 2 sls from this comparison, as 
it does not have a mean in the just-identified case). We consider <Ji 2 E {0.1,0.5,0.95} 
and examine a wide range of values for n > 0o These results are plotted in the first 
panel of Figure [D 

If rather than mean bias we instead consider median bias, we find that j3u and j3 2 sls 
generally exhibit smaller median bias than (3 full- There is no ordering between f3jj 
and (3 2 sls in terms of median bias, however, as the median bias of j3u is smaller than 
that of (3 2 sls f° r very small values of 7r, while the median bias of (3 2 sls is smaller for 
larger values n. A plot of median bias is given in Appendix IF.II 


4.1.2 Estimator Absolute Deviation 

We examine the distribution of the absolute deviation of each estimator from the true 
parameter value. The last three panels of Figure |T] plot the 10th, 50th, and 90th 
percentiles of absolute deviation of the estimators considered from the true value f3 for 
three values of a± 2 . We plot the log quantiles of absolute deviation (or equivalently the 
quantiles of log absolute deviation) for the sake of visibility. Here, and in additional 
unreported simulation results, we find that (3u has smaller median absolute deviation 
than (3iy uniformly over the parameter space. The 10th and 90th percentiles of the 
absolute deviation are also lower for j3u than (3iv for much of the parameter space, 
though we find that there is not a uniform ranking for all percentiles. The Fuller 
estimator has low median absolute deviation over much of the parameter space, but 
performs worse than both f3jj and (3jy in certain cases, such as when <Ji 2 = 0.95 and 
the first stage coefficient is small. Turning to mean absolute deviation, we find that 

14 We restrict attention to ir > 0.16 in the bias plots. Since the first stage F-statistic is F = in 
the present context, this corresponds to E[F] > 1.026. The expectation of f3jj ceases to exist at n = 0, 
and for 7r close to zero the heavy tails of /3u make computing the expectation very difficult. Indeed, 
we use numerical integration rather than monte-carlo integration here because it allows us to consider 
smaller values n. We thank an anonymous referee for this suggestion. 
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the mean absolute deviation of flu from /3 exceeds that of /3full except in cases with 
very high p and small n, while as already noted the mean absolute deviation of /3/v is 
infinite. 

Thus, over much of the parameter space the unbiased estimator is more concen¬ 
trated around the true parameter value than the 2SLS estimator, according to a variety 
of different measures of concentration. It would be interesting to decompose the devia¬ 
tions from the true parameter value into bias and variance components. Unfortunately, 
however, the lack of second moments of both the 2SLS and unbiased estimators means 
that the variance is infinite in both cases and therefore does not yield a useful compar¬ 
ison. To get around this, we consider the distribution of the absolute deviation of each 
estimator from the median of the estimator as a location free measure of dispersion. In 
Appendix IF.21 we examine this numerically and find a stochastic dominance relation 
in which the unbiased estimator is less dispersed than the 2SLS estimator and more 
dispersed than the Fuller estimator uniformly over the parameter space. 


4.2 Performance with Multiple Instruments 


In models with multiple instruments, if we assume that errors are homoskedastic an 
equivariance argument closely related to that in just-identified case again allows us to 
reduce the dimension of the parameter space. Unlike in the just-identified case, how¬ 
ever, the matrix Z'Z and the direction of the first stage, 7r/||7r||, continue to matter (see 
Appendix [E] for details). As a result, the parameter space is too l arge to fully e xplore 


by simulation, so we instead calibrate our simula tions to t he 


Staiger fc Stockl 1 1997 ) 


specifications for the 1930-1939 cohort in the Angris t fe Krueger 


1 1991 ) data. While 


there is statistically significant heteroskedasticity in this data, this significance appears 
to be the result of the large sample size rather than substantively important deviations 
from homoskedasticity. In particular, procedures which assume homoskedasticity pro¬ 
duce very similar answers to heteroskedasticity-robust procedures when applied to this 
data. Thus, given that homoskedasticity leads to a reduction of the parameter space 
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Figure 1: The first panel plots the bias of single-instrument estimators, calculated by numerical inte¬ 
gration, against the mean E [F] of first-stage F-statistic. The remaining panels plot log quantiles of 
absolute deviation from the true value of /3 for unbiased estimator, 2SLS, and Fuller, for three values 
of <Ji 2 - The lines corresponding to the median are plotted without markers, while the lines corre¬ 
sponding to the 90th and 10th percentiles are plotted with upward and downward pointing triangles, 
respectively. The absolute deviation results are based on 10 million simulation draws. 


23 









































as discussed above, we imp ose h om oskedasticit y in our simulations. 


In each of the four 


Staiger fe Stock 


specihcations we estimate vr/||7r|| and 


Z' Z from the data (ensuring, as discussed in Appendix [Gl that 7r/||7r|| satishes the sign 
restriction). After reducing the parameter space by equivariance and calibrating Z'Z 
and 7r/||7r|| to the data, the model has two remaining free parameters: the norm of 
the first stage, ||7r||, and the correlation auv between the reduced-form and first-stage 
errors. We examine behavior for a range of values for ||7r|| and for auv £ {0.1, 0.5, 0.95} . 
Further details on the simulation design are given in Appendix [CTl 

For each parameter value we simulate the performance of 02 sls, ft full (which is 
again the Fuller estimator with constant equal to one), and ft RB as defined in Section 
13.21 We also consider the robust estimators 0 RB c discussed in Section 13.31 for c 6 
{0.1, 0.5, 0.9}, but find that all three choices produce very similar results and so focus on 
c = 0.5 to simplify the graphs^] Even with a million simulation replications, simulation 
estimates of the bias for the unbiased estimators (which we know to be zero from the 
results of Section [3]) remain noisy relative to e.g. the bias in 2SLS in some calibrations, 
so we do not plot the bias estimates and instead focus on the mean absolute deviation 


(MAD) E v # 


ft- ft 


since, unlike in the just-identified case, the MAD for 2SLS is 
now finite. We also plot the lower bound on the mean absolute deviation of unbiased 
estimators discussed in Section l3~4l The results are plotting in Figure [2] 

Several features become clear from these results. As expected, the performance of 
2SLS is typically worse for models with more instruments or with a higher degree of cor¬ 
relation between the reduced-form and first-stage errors (i.e. higher auv)- The robust 
unbiased estimator 0rb, 0.5 generally outperforms ft RB = ft RB0 . Since the estimators 
with c = 0.1 and c = 0.9 perform very similarly to that with c = 0.5, they outperform 
ft RB as well. The gap in performance between the RB estimators and the lower bound 
on MAD over the class of all unbiased estimators is typically larger in specihcations 
with more instruments. Interestingly, we see that the Fuller estimator often performs 
quite well, and has MAD close to or below the lower bound for the class of unbiased 
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All results for the RB estimators are based on 1,000 draws of £. 


24 





















Figure 2: Mean absol ute deviation of estimators in simulations calibrated to specification I-IV of 


Staiger fc Stock ( 19971) . These specifications have k = 3,30,28, and 178 instruments, respectively. 


Results for specifications I-III are based on 1 million simulation draws, while results for specification 
IV are based on 100.000 simulation draws. 
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estimators in most designs. While this estimator is biased, its bias decreases quickly in 
||7r|| in the designs considered. Thus, at least in the homoskedastic case, this estimator 
seems a potentially appealing choice if we are willing to accept bias for small values of 

71 . 


5 Empirical Applications 


We calculate our proposed estimators in two e mpirical 
sider the data and specifications used in 


applications. First, we con- 


Hornung (2 0141) to examine the effect of sev- 


ent eenth century m i gratio ns on productivity. For our second app li cation , we study 


the 


Staieer fe Stockl (119971 ) specifications for the lAngrist fe Kruegerl (119911) dataset on 


the relationship between education and labor market earnings. Before continuing, we 
present a step-by-step description of the implementation of our estimators. 


5.1 Implementation 

To describe the implementation in a general setup, we introduce additional notation to 
explicitly allow for additional exogenous variables (such as a constant). We have obser¬ 
vations t — 1,..., T with Y t a scalar outcome variable, X t a scalar endogenous variable, 
Z t a k x 1 vector of instruments and W t a vector of additional control variables. Let 

Y = (Yi, ..., Yt)', X = (Xi, . .., X T )', Z = (Z u ..., Z T )' and W = (W u ..., W T )'■ Let 

Y = (I-W(W'W)- l W')Y, X = (. I-W{W'W)~ l W')X and Z = (/- WiW'W)-^')^ 
denote the residuals from regressing Y, X and Z on W, as described in the introduction. 

Our estimates are obtained using the following steps. 

1.) Let and £ 2 denote the estimates of the coefficient on Z t in the regressions of 
Yt and X t respectively on Z t and Wt, and let Ut and V t denote residuals from 
these regressions. Let E denote an estimate of the variance-covariance matrix of 
(£i, ^> 2 )'' ^ fl ie observations are independent (but possibly heteroskedastic), we 
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can use the heteroskedasticity robust estimate 


(/a <8) (Z'Z)- 1 ) 


E 


We use this estimate in our app 
while for our application based on 


U?Z t Z[ U t V t Z t Z[ 
tV t Z t Z' t 
ication 


t=i \ U t v t z t z' V t 2 Z t Z' t 


(/ 2 ® (z'zy 1 ). 


rased on Au g rist fc Krueg er (1199111 . 


Hornung (120141) we follow Hornung and use a 


clustering-robust variance estimator. Likewise, in tim e-series contex ts one could 


use a serial-correlation robust variance estimator, e.g. 


Newev fe West 


( 19871 ) here. 


2. ) In the case of a single instrument (so Z t is scalar), the estimate is given by /?{/(£, E) 

where Pu{',‘) is defined in Theorem 12.11 

3. ) In the case with k > 1 instruments, let E 22 denote the lower-right k x k submatrix 

of E, and let M be the matrix given in (JDj) with E 22 replaced by E 22 for some 
choice of c between 0 and 1 (we find that c = .5 works well in our Monte Carlos). 
Let £ = (/ 2 ® M)£ and E = (J 2 0 M)E(/ 2 <g) M)'. Let E (i) denote the 2x2 
symmetric matrix with diagonal elements given by the i,i and (k + i), (k + i) 
elements of E respectively and off-diagonal element given by the i, (k + i) element 
of E. Generate S independent N( 0, E) vectors Ci , • • •, (s- Let and Cs,i denote 
the fc x 1 vectors with elements 1 through k of £ and ( s respectively and let £ 2 
and <C Sj2 denote the fc x 1 vectors with elements k + 1 through 2k of £ and ( s 
respectively. Let |(i) = (|i,i,6,j)' and let ( s (i) = Let 

k 

Pa = ^2wi, s Pu(£(i) + C«(*),2fl(i)) 

i= 1 


where Pu{',‘) is dehned in Theorem 12.11 and 

(| 2 - Ca,2) , M- 1 , (Z t Z)M- 1 e i €f i (i 2 - C s, 2 ) 




(6-C,2 )'M-1'(Z'Z)M-I(e 2 -C,2) 

The estimator is given by the average over S simulation draws: 

1 ” 


2—1 


In our application, we use S = 100, 000 simulation draws. 
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5.2 


Hornung (2014) 


Hornung (20141) studies the long term impact of the flight of skilled Huguenot refugees 


from France to Prussia in the seventeenth century. He finds that regions of Prus¬ 
sia which received more Huguenot refugees during the late seventeenth century had 
a higher level of productivity in textile manufacturing at the start of the nineteenth 
century. To address concerns over endogeneity in Huguenot settlement pat terns and 


obta in an estimate for the causal effect of skilled immigration on productivity, 


Hor nu ng 


(2 0141) considers specifications which instrument Huguenot immigration to a given re¬ 
gion using population losses due to plague at the end of the Thirt y Years’ W ar. For 
more information on the data and motivation of the instrument, see Hornung ( 2014 1. 


Hornung’s argument for the validity of his instrument clearly implies that the first- 
stage effect should be positive, but the relationship between the instrument and the 
endogenous regressors appears to be fairly wea k. In particular, the four IV specifi¬ 
cations reported in Tables 4 and 5 of H ornung (20141) have first-stage F-statistics of 
3.67, 4.79, 5.74, and 15.35, respectively. Thus, it seems that the conventional normal 
approximation to the distribution of IV estimates may be unreliable in this context. 
In each of the four main IV specifications considered by Hornung, we compare 2SLS 
and Fuller (again with constant equal to one) to our estimator. Since there is only a 
single instrument in this context, the model is just-identified and the unbiased estima¬ 
tor is unique. In each specification we also compute and report an identification-robust 
Anderson-Rubin confidence set for the coefficient on the endogenous regressor. The 
results are reported in Table [T] 

As we can see from Table [Q our un biased es timates in specifications I-III are smaller 
than the 2SLS estimates computed in Hornung (20141 (the unbiased estimate is smaller 


in specification IV as well, though the difference only appears in the fourth decimal 
place). Fuller estimates are, in turn, smaller than our unbiased estimates. Nonethe¬ 
less, difference between the 2SLS and unbiased estimates is less than half of the 2SLS 
standard error in every specification. In specifications I-III, where the instruments are 
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to 

to 


Specification 

Estimator 

I 


II 


III 


IV 

X : Percent Huguenots in 1700 

2SLS 

3.48 


3.38 


1.67 




Fuller 

3.17 


3.08 


1.59 




Unbiased 

3.24 


3.14 


1.61 



X : log Huguenots in 1700 

2SLS 







0.07 


Fuller 







0.07 


Unbiased 







0.07 

95% AR Confidence Set 


(-oo,59.23]U[1.55,oo) 


[1.64,19.12] 


[-0.45,5.93] 


[-0.01,0.16] 

Other controls 


Yes 


Yes 


Yes 


Yes 

Observations 


150 


150 


186 


186 

Number of Towns 


57 


57 


71 


71 

First Stage F-Statistic 


3.67 


4.79 


5.74 


15.35 


Table 1: Results in iHornund (12014 1 data. Specifications in columns I and II correspon d to Ta b le 4 c olumns (3) and (5) in iHornun g (120141 1. 


respectively, while columns III and IV correspond to Table 5 columns (3) and (6) in iHornund (120141 1. Y =log output, X as indicated, and 
Z =un adjusted p opulation losses in I, interpolated population losses in II, and population losses averaged over several data sources in III and 
Hornunei (12014 1. The 2SLS and Fuller rows report two stage least squares and Fuller estimates, respectively, while Unbiased reports j3jj- 


IV. See 


Other controls include a constant , a dum my fo r wheth er a tow n had relevant textile production in 1685, measurable inputs to the production 
process, and others as in 


Hornuna (j2014|). As in 


Hornung |2014j), all covariance estimates are clustered at the town level. Note that the unbiased 


and Fuller estimates, as well as the AR confidence sets, have been updated to correct an error in the March 22, 2015 version of the present 


paper. 












































relatively weak, the 95% AR confidence sets are substantially wider than 95% con¬ 
fidence sets calculated using 2SLS standard errors, while in specification IV the AR 
confidence set is fairly similar to the conventional 2SLS confidence set. 


5.3 


Angrist & Krueger (1991) 


Angrist fc Krueger! ( 19911) are interested in the relationship between education and la¬ 


bor market earnings. They argue that students born later in the calendar year face a 
longer period of compulsory schooling than those born earlier in the calendar year, and 
that quarter of birth is a valid instrument for years of schooling. As we note above 


their argument implies t hat the sign o f the 


literature, beginning with 


Bound et al. 


irst-stage effect is known. A substantial 


(119951 ). notes that the relationship between the 


instruments and the endoge n ous regressor app ears to be quite weak in some specifi¬ 


cations considered in lAngrist fe Krueger! (119911) . Here we consider four specifications 
from IStaiger fe Stockl (119971), base d on the 1930-1939 cohort. See lAngrist &; Krueger 


(119911) and IStaiger fe Stockl (119971) for more on the data and specification. 


We calculate unbiased estimators $* RBl $*rbq.ii Prb 0 . 5 > an d $rb 0 . 9 - cases 

culate confidence sets we use the qu asi-CLR (or GMM- 


we take W = Z'Z. To ca 


M) test of K leibergen ( 2005 ). which simplifies to the CLR test of More ira ( 2003 ) un¬ 


der homos 

cedast 

Mikusheva 

2010) 


icity and so delivers nearly-optimal confidence sets in that case (see 
2010 ). Thus, since as discussed above the data in this application appears 


reasonably close to homoskedasticity, we may reasonably expect the quasi-CLR confi¬ 
dence set to perform well. All results are reported in Table [2] 

A few points are notable from these results. First, we see that in specifications I and 
II, which have the largest first stage F-statistics, the unbiased estimates are quite close 
to the other point estimates. Moreover, in these specifications the choice of c makes little 
difference. By contrast, in specification III, where the instruments appear to be quite 
weak, the unbiased estimates differ substantially, with j3* RB yielding a negative point 
estimate and j3* RB c for c 6 {0.1, 0.5, 0.9} yielding positive estimates substantially larger 
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Specification 

I 


II 


III 


IV 


P 


P 


P 


P 

2SLS 

0.099 


0.081 


0.060 


0.081 

Fuller 

0.100 


0.084 


0.058 


0.098 

LIML 

0.100 


0.084 


0.057 


0.098 

0RBi 

0.097 


0.085 


-0.041 


0.056 

Prb, c = 0.1 

0.098 


0.083 


0.135 


0.066 

Prb, c = 0.5 

0.098 


0.083 


0.135 


0.066 

Prb, c = 0.9 

0.098 


0.083 


0.135 


0.066 

First Stage F 

30.582 


4.625 


1.579 


1.823 

QCLR CS 

[0.059,0.144] 


[0.046,0.127] 


[-0.588,0.668] 


[0.056,0.150] 

Controls 








Base Controls 

Yes 


Yes 


Yes 


Yes 

Age, Age 2 

No 


No 


Yes 


Yes 

SOB 

No 


No 


No 


Yes 

Instruments 








QOB 

Yes 


Yes 


Yes 


Yes 

QOB*YOB 

No 


Yes 


Yes 


Yes 

QOB*SOB 

No 


No 


No 


Yes 

# instruments 

3 


30 


28 


178 

Observations 

329,509 


329,509 


329,509 


329,509 

Table 2: Results fo 

Angrist & Kruegei 

(19911 data. Specifications as in Staiger & Stock (19971 


=log weekly wages, Y=years of schooling, instrume nts Z and exogeno us controls as indicated. QCLR is 
the is the quasi-CLR (or GMM-M) confidence set of lKleibergenl (12005 1. Unbiased estimators calculated 
by averaging over 100,000 draws of 
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than the other estimators considered I IG I A similar, though less pronounced, version of 
this phenomenon arises in specification IV, where unbiased estimates are smaller than 
those based on conventional methods and j3* RB is almost 20% smaller than estimates 
based on other choices of c. 

As in the simulations there is very little difference between the estimates for c G 
{0.1, 0.5, 0.9}. In particular, while not exactly the same, the estimates coincide once 
rounded to three decimal places in all specifications. Given that these estimators are 
more robust to violations of the sign restriction than that with c = 0, we think it makes 
more sense to focus on these estimates. 


6 Conclusion 


In this paper, we show that a sign restriction on the first stage suffices to allow finite- 
sample unbiased estimation in linear IV models with normal errors and known reduced- 
form error covariance. Our results suggest several avenues for further research. First, 

(2014) finds 


while the focus of this paper is on estimation, recent work by 


Mills et al. 


good power for particular identification-robust conditional t-tests, suggesting that it 
may be interesting to consider tests based on our unbiased estimators, particularly 
in over-identifed contexts where the Anderson-Rubin test is no longer uniformly most 
powerful unbiased. More broadly, it may be interesting to study other ways to use the 


knowledge of the first stage sign, both for testing and estimation purposes. 

16 All unbiased estimates are calculated by averaging over 100,000 draws of £. For all estimates 
except j3* RB in specification III, the residual randomness is small. For (3* RB in specification III, however, 
redrawing £ yields substantially different point estimates. This issue persists even if we increase the 
number of ( draws to 1,000,000. 
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This appendix contains proofs and additional results for the paper “Unbiased Instru¬ 
mental Variables Estimation Under Known First-Stage Sign.” Appendix [A] gives proofs 
for results stated in the main text. Appendix [B] derives asymptotic results for models 


with non-normal errors anc 
lates our results to those of 


an unknown reduced- form error variance. Appendix [Q 


re- 


Hirano &; Porter! (120151) . Appendix [D] derives a lower bound 


on the risk of unbiased estimators in over-identified models, discusses cases in which 
the bound in attained, and proves that there is no uniformly minimum risk unbiased 
estimator in such models. Appendix [F] gives additional simulation results for the just- 
identified case, while Appendix [G] details our simulation design for the over-identified 


case. 


A Proofs 

This appendix contains proofs of the results in the main text. The notation is the same 
as in the main text. 

A.l Single Instrument Case 

This section proves the results from Section [2j which treats the single instrument case 
(k = 1). We prove Lemma 12. II and Theorems 12. 1 112.21 and 12.31 

We first prove Lemma 12.11 which shows unbiasedness of f for 1/tt. As discussed 
in the main text, this result is known in the literature (see, e.g., pp. 181-182 of 


37 











Voinov fe Nikulin 


19931 ). We give a constructive proof based on elementary calculus 


Voinov fe Nikulinl provide a derivation based on the bilateral Laplace transform). 


Proof of Lemma \KT\ Since £2/02 ~ A^(7 t/ cr 2 , 1 ), we have 

^,^(6^2) = ~ [ 1 ~ = ~ — *h(a;)) exp ((7 K/a 2 )x - ( 7 t/< 7 2 ) 2 / 2 ) dx 


^2 


^2 


= —exp(-(7r/a 2 ) /2) j [(1 - $(x))(a 2 /n) exp((7r/a 2 )x)]“ = _ 00 + J 0 2 /tt) exp((7r/cx 2 )x)0(x) dx } , 

using integration by parts to obtain the last equality. Since the first term in brackets 
in the last line is zero, this is equal to 


1 

°2 


(ct 2 /7t) exp((7r/<7 2 )x - (7 t/<7 2 ) 2 /2)</>(x) dx — — cj>(x - 1 r/cx 2 ) dx = 


7 r 


7 r 


□ 


We note that f has an infinite 1 + e moment for e > 0. 

Lemma A.l. The expectation o/f(£ 2 , o'!) 1-1 " 5 is infinite for all 7 r and e > 0. 

Proof. By similar calculations to those in the proof of Lemma 12.11 

K,/ 3 t(£ 2 , o|) 1+£ = J ^ - ex P ((n/<72)x - (tt/ct 2 ) 2 /2) dx. 

For x < 0, 1 — <3>(x) > 1/2, so the integrand is bounded from below by a constant times 
exp(ex 2 /2 + ( 7 t/ct 2 ):e), which is bounded away from zero as x —> — 00 . 

□ 


Proof of Theorem, \2.11 To establish unbiasedness, note that since £ 2 and £1 — ^£2 are 

°2 

jointly normal with zero covariance, they are independent. Thus, 


— (En,pT) 


E,J> 1 f. - 

°2 


cr 2 7T V cr 22 / a 22 


since E 7T ^f = 1 /7r by Lemma 12.11 

To establish uniqueness, consider any unbiased estimator j3 (£, £). By unbiasedness 


7 7T,/3 


/3(£,e)-M£,e) =0 V£eB,^n. 
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The par ameter space conta ins an open set by assumption, so by Theorem 4.3.1 of 
Lehmann fe Romanol (J2005S) the family of distributions of £ under (tt, (3) G 0 is com¬ 
plete. Thus (3 (£, £) — j3u(£, S) = 0 almost surely for all (tt, j3) G 0 by the definition of 
completeness. 

□ 

^ 1-l-s ^ 1+e 

Proof of Theorem \2.2l If E^p Pu(£, S) were finite, then E n ,p Pu(£, S) — cr^/crf 
would be finite as well by Minkowski’s inequality. But 

l+£ 


7 7T,/3 


l+£ 


Pu(€, £) - CTl2/cr| = E Vt p |f (6, 0%) I i+i E, 


7 T ,/9 


6 - pA 
°22 


and the second term is nonzero since £ is positive definite. Thus, the 1 + e absolute 
moment is infinite by Lemma IA.1I The claim that any unbiased estimator has infinite 
1 + £ moment follows from Rao-Blackwell: since /3p(£, £) = E /?(£,£) |£ for any 
unbiased estimator (3 by the uniqueness of the non-randomized unbiased estimator 
based on £, Jensen’s inequality implies that the 1 + £ moment of \(3\ is bounded from 
below by the (infinite) 1 + £ moment of \Pu\- □ 

We now consider the behavior of fiu relative to the usual 2SLS estimator (which, in 
the single instrument case considered here, is given by fosLS — 6/6) as tt — > 00 . 


Proof of Theorem \2.3[ Note that 
Pu ~ P 2 SLS = (t{&, &l) ~ 


6 - ^6 ) = 


) = (^(6.»3)-i)(!^) 


\ 6 / V °2 

As 7T —> 00 , 6/6 = P 2 SLS = Op{ 1), so it suffi ces to show that tt (£ 2 t(6 , of) ~ 1) = 


op(l) as tt —* 00 . Note that, by Section 2.3.4 of Small (2010). 

6 1 - $( 6 /^ 2 ) 


TT 666, cr 2 ) - 1 = TT 


(Xn TT <7n 


< TT— = —— 

e 2 2 6 6' 


@2 0(6/62) 

This converges in probability to zero since 7 t/6 6 1 and ^ A 0 as tt —> 00 . 


□ 


The following lemma regarding the mean absolute deviation of (3p will be useful in 
the next section treating the case with multiple instruments. 
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Lemma A.2. For a constant K(f3, £) depending only on £ and (3 (but not on tt), 


7 tE, 


7 T ,/3 




Proof. We have 
7r(j9u-j9) =7r 




CTn 


(To 


= 7TT ■ ( £l - ^6 ) + - n/3 


ax 


ax 


= 7TT • ( fi - ^7T - (6 - A 

°2 


012 


- <Tl2 I ° 12 a 

ttt/jtt — 7rr— 5-77 + 7r— 5 — 7rp 


^2 


^2 


= 7TT • 6 - 07T-y(& - 7T) + 7r(7rf - 1) /?-V • 


^2 


012 


^2 


Ay — (3 

VI 

— /5?r — -^-(^2 — tt) 

+ TTE n> p\TTT - 1 




<*2 




Using this and the fact that £ 2 and £1 — ^f &2 are independent, it follows that 

^2 

where we have used the fact that ttt = 1. The only term in the above expression 
that depends on 7r is 7rE n ^\Trf—l\. Note that this is bounded above by ttE^^ttt+tt = 2tt 
(using unbiasedness and positivity of f), so we can assume an arbitrary lower bound 
on 7r when bounding this term. 

Letting tt = 7r/cr 2 , we have £ 2 /er 2 ~ N(tt, 1), so that 


7T 7T 

— E„ « 7 rr — 1 = — E. 


a 2 


^2 


7 T ,/5 


7T 1 - $(6/cr 2 ) 


02 0(6/^) 


= 7T 


7T- 






1 


4>(z — tt) dz. 


Let £ > 0 be a constant to be determined later in the proof. By (1.1) in iBar iczl (120081 ) 
' 2 1 w 4>(z — tt) dz 


TT 


f 

1 - $(z) 1 

/ Z>TC£ 

1 


< TT 

The first term is 


( 

1 

1 

/ Z>TT£ 

z 

TT 


(z — tt) dz + tt 2 


f 

z 1 

/ Z>TT£ 

Z 2 + 1 TT 


(j)(z — tt) dz. 


~ 2 [ 

TT — Z 

TT / 


J Z>TT£ 

TTZ 


4>(z — tt) dz < tt 2 


f 

7T — £ 

/ 2:>7T£ 

7f 2 £ 


The second term is 


7 r 


< TT 


Z> TT£ 

2 / 


Z + 1/Z TT 

\tt — z\ + - 


' Z>TT£ 


7T 2 £ 


<f{z — tt) dz = TT 2 
L 4>(z — tt) dz < - 


4>(z — tt) dz < 


TT - (z + 1/z) 


Z> 7T£ 


7 t(z + 1/z) 

1 


\u\<f)(u) du. 


4>(z — tt) dz 


u\ H—- J 4>(u) dz. 

£TT 
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We also have 


7 r 


' Z<1TE 


1 - <&(*) 1 

4>{z) 7T 


(j)(z — 7 1 ) dz < 7T 2 


' Z<TTE 


1 - $(*) 

<£(*0 


cj)(z — n) dz + n / (j)(z — 7r) dz. 


' Z<7TE 


The second term is equal to 7r < h(7re — 7r), which is bounded uniformly over 7f for e < 1. 
The first term is 


7r 2 I (1 — <&(z)) exp I nz — -7r 2 ) dz 

' Z<TT£ ' " 


= 7T 


= 7T 


' Z<7T£ J t>Z 


4>(t) exp ( 7tz — -7r 2 ) dtdz 


' J 2;<min{t,7fe} 

l. 


4>(t) exp ( 7 rz — -7T 2 ) dzdt 


= 7T exp ( — — 7T 


0(t) 


'teM 


— exp [TTZ) 
7T 


min{t,7re} 


(it 


J z=—oo 


= 7T exp ( — — 7T 


0(t) exp (7r minjt, 7re}) dt 


iteM. 


< 7Texp ( — -7T 2 + £7T 2 


For £ < 1/2, this is uniformly bounded over all it > 0. 


□ 


A.2 Multiple Instrument Case 

This section proves Theorem 13. II and extends this theorem to cover unbiased estimators 
that are efficient under strong instrument asymptotics in the heteroskedastic case. In 
particular, we prove an extension of this theorem allowing for unbiased estimators 
that are asymptotically equivalent to a GMM estimator of the form /3gmm,w — , 

where W = W{£) is a data dependent weighting matrix. For Theorem 13.11 W is the 
deterministic matrix Z'Z . In models with non-homoskedastic errors the two step GMM 
estimator with weighting matrix 

11 = (^11 — ^2SLs(^12 + S 2 l) + ^2SLS^22^ (10) 
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is asymptotically efficient under strong instruments. Here, W is an estimate of the 
inverse of the variance matrix of the moments £1 — /3 £ 2 , which the GMM estimator sets 
close to zero. Let 


W GMM 


,M W ) 




(ii) 


where 


!H(e (b) ) = (s n -/3(e (6) )(Si 2 + S 21 ) +/3(e (6) ) 2 S 22 ) 

for a preliminary estimator /3(£®) of [5 based on ^ h K The Rao-Blackwellized esti¬ 
mator formed by replacing ui* with w* GMM in the dehnition of j3* RB gives an unbiased 
estimator that is asymptotically efficient Tinder strong instrument asymptotics with 
non-homoskedastic errors, as we now show by proving an extension of Theorem 13.11 
that covers the weight matrix in (TTUD in addition to the matrix Z'Z used in Theorem 

o 

Consider the GMM estimator ft gmm,w = , where W = IT(£) is a data depen¬ 

dent weighting matrix. For Theorem 13.11 W is the deterministic matrix Z'Z while, in 
the extension discussed above, W is defined in (fTUD . In both cases, W A W* for some 
positive definite matrix W* under the strong instrument asymptotics in the theorem. 
For this W*, define the oracle weights 

7r'TF*ei 7r'kF*eje'7r 
W ■ — 7T-— --— 

* 1 TT’W*TT 7T'W*7T 


and the oracle estimator 

k 

Prb = Prb{£, £; W*) = &,(£, S; w*) = ^ S(i)). 

Dehne the estimated weights as in (fTTD : 




< = = 




and the Rao-Blackwellized estimator based on the estimated weights 

k 

Prb = = E [<(«?’ )&« W (i).2E(i)) 

i=1 
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In the general case, we will assume that w*(^) is uniformly bounded (this holds for 
equivalence with 2SLS under the conditions of Theorem 13.11 since sup|| u || =1 u'Z' Ze,e'u 
is bounded, and one can likewise show that it holds for two step GMM provided £ has 
full rank). Let us also define the oracle linear combination of 2SLS estimators 

no ^ ^ 

@2SLS — W i T~- 
1=1 

Lemma A.3. Suppose that w is deterministic: w(^ b ') = w for some constant vector 
w. Then f3 RB (f, £;w) = 0 W (£, E;w). 

Proof. We have 


Prb(Z, = E 


£«iAKW(i),2 E(i)) 


i— 1 


y^^iE 


i= 1 


^(e (a) (i),2£(i)) 


z 


Since = C(i) + £(*) (where ((i) = (Ci,Ck+i)')» £ (a) (*) is independent of {£(j)}j¥= 


conditional on £(*). Thus, E 




-i) 


z 

= E 




Z(i) 


Since 


E 


Pu(£W(i),2E(i)) 




is an unbiased estimator for /? that is a deterministic function 
of £(i), it must be equal to j3u{£(i), £(*)), the unique nonrandom unbiased estimator 
based on £(*) (where uniqueness follows by completeness since the parameter space 
{(/37Tj, 7Tj)|7Tj G lR+,/3 G M} contains an open rectangle). Plugging this in to the above 
display gives the result. □ 


Lemma A.4. Let ||7r|| —> oo with ||7r||/ min, 7q = 0(1). TTien ||7r|| \/3gmm,w ~ PZsls) 
0. 


Proof. Note that 


Pgmm,w — PZsls ~ 


^6 trv 




6 , 

6, 


y- /_ vrTW^7r\ 6, = y- f &W ei e& n 

^ 1 VT'^TT M ^6 


2 — 1 


n' W*7T 


Z‘2,i 


P), 


where the last equality follows since = Yli=l = 1 with proba¬ 


bility one. For each i, 7 q(£ M /£2,i - P) = O p {1) and ~ " Zw*f< ^ 0 as the 
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elements of n approach infinity. Combining this with the above display and the fact 
that ||7r||/miip 7Tj = 0(1) gives the result. □ 

Lemma A.5. Let ||7r|| —> oo with ||7r||/ min* 7p = 0(1). Then ||7r|| (j3% S ls ~ Prb^ 0. 

Proof. By Lemma [A.31 


7T 


2 sls P°rb) — AAAAU*)) 


By Theorem 12.31 — /3j/(£(i), E(*))^ A 0 . Combining this with the boundedness 

of ||7r||/ min,; 7r ? ; gives the result. □ 

Lemma A.6. Let ||tt|| — > oo with ||vr||/min,7r* = 0(1). Then ||7r|| (^/3 RB — fl* RB ^J A 0. 

Proof. We have 


1>°rb ~ 'Orb = [« - *K<“>(i),2E(i)) 


1=1 


E E [("’.* - »?(£*)) (*K w (i),2E(i)) - p 


i =1 


using the fact that Yli=i w i = Ei=i AAU = 1 with probability one. Thus, 


j /3,7T 


P°rb-P'rb <Y. E A* (*K (a) W,2S(0)-/3 


Z=1 


A*(U } )| At- &(e (a) (0.2E(0) -0 


Z=1 


As ||7r|| — >■ oo, Wid^) — w* A 0 so, since u)*(U' ) ) is bounded, Ep i7T | w* — i&*(£A| “^ 0- 
Thus, it suffices to show that 7r iEp^ A/AVA 2S(i)) — f3 = 0(1) for each i. But this 
follows by Lemma [A.21 which completes the proof. □ 


B Non-Normal Errors and Unknown Reduced Form 
Variance 

This appendix derives asymptotic results for the case with non-normal errors and an 
estimated reduced form covariance matrix. Section IB. II shows asymptotic unbiasedness 
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in the weak instrument case. Section IB.21 shows asymptotic equivalence with 2SLS in 
the strong instrument case (where, in the case with multiple instruments, the weights 
are chosen appropriately). The results are proved using some auxiliary lemmas, which 
are stated and proved in Section IB.31 

Throughout this appendix, we consider a sequence of reduced form estimators 

(Z'Z)- 1 Z'Y 
(Z'Z)- l Z'X 

which we assume satisfy a central limit theorem: 




y/r^i 



4n(o,e*) , 


( 12 ) 


where ttt is a sequence of parameter values and E* is a positive definite matrix. Fol¬ 
lowing IStaiger fe Stockl (1199711 . we distinguish between the case of weak instruments, 
in which ttt converges to 0 at a \/T rate, and the case of strong instruments, in which 
7 tt converges to a vector in the interior of the positive orthant. Formally, the weak 
instrument case is given by the condition that 


VTttt —> tt* where tt* > 0 for all i 


(13) 


while the strong instrument case is given by the condition that 


7 tt —>■ tt* where tt* > 0 for all i. (14) 

In both cases, we assume the availability of a consistent estimator E for the asymptotic 
variance of the reduced form estimators: 


E 4 E*. 


(15) 


The estimator is then formed as 


Pr B (£,E/T,w) = E t/T [p w (£( a \2E/T,w(£W))\i 
= [ p w (£ + T~ 1/2 E 1/2 ri, 2E/T, w(£ - T~ 1/2 E 1/2 ri)) dP N{0 ,i 2k )(v) 
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where = £ + T -1 / 2 !] 1 / 2 ?] and £® = £ — for r] ~ N(0,I 2 k) independent 

of £ and E, and we use the subscript in the expectation to denote the dependence of 
the conditional distribution of £(“) and on E/T. In the single instrument case, 
Prb{1, Z/T,w) reduces to /3f/(£, E/T). 

For the weights w, we assume that u)(£^) is bounded and continuous in £® with 
= ^ an d r u , i(a£ } ( b ' > ) = Wi{£W) for any scalar a, as holds for all the weights 
discussed above. Using the fact that /3u(\/ax,aQ) = f3u(x, hi) for any scalar a and any 
x and 12, we have, under the above conditions on w, 

Prb(1 S/T, w) = J P w (Vfi + E 1 / 2 t 7 , 2 E, w(Vt£ - E 1 / 2 ^)) dP N{0 j 2k) ( V ) = P RB {Vfl E, w). 

Thus, we can focus on the behavior of \/T£ and E, which are asymptotically nonde¬ 
generate in the weak instrument case. 


B.l Weak Instrument Case 


The following theorem shows that the estimator jd RB converges in distribution to a 
random variable with mean fd. Note that, since convergence in distribution does not 
imply convergence of moments, this does not imply that the bias of /3 RB converges 
to zero. While it seems likely this stronger form of asymptotic unbiasedness could be 
achieved under further conditions by truncating (d RB at a slowly increasing sequence of 
points, we leave this extension for future research. 


Theorem B.l. Let iTM) PT3 j) and m hold, and suppose that w (£^) is bounded and 
continuous in with Wi{a e (b) ) = Wi K W ) for any scalar a. Then 


Prb(1 S/T, w) = P RB {Vfl E, w) A p RB (e, T*,w) 
where £* ~ /3,tt *')', E*) and E /3 rb (£*,T,*,w) = (3. 

Proof. Since \/T£ A £* and E 4 E’, the first display follows by the continuous map¬ 
ping theorem so long as (d RB (f*, E*,iu) is continuous in £* and E*. Since 

/WC, S*» = [ i(f + E 1/2 I] ,2E*,®(r-E* 1/2 ? ,))dP w( o,; 2t )( f) ) (16) 
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and the integrand is continuous in £* and E*, it suffices to show uniform integrability 
over £* and E* in an arbitrarily small neighborhood of any point. The pth moment of 
the integrand in the above display is bounded by a constant times the sum over i of 



MC(z) + £* 1/2 (f);?,2£*(z)) P ct ) {z l )ct ) {z 2 )dz 1 dz 2 = I2(£*(i), E*(i), 0,p), 


where R is dehned below in Section fB.31 By Lemma [B.II below, this is equal to 

f,( «(») fsnto s; 2 w 2 V /2 s; 2 («) 

S5j(i)V ’ %(') ]’ 

which is bounded uniformly over a small enough neighborhood of any £* and E* with 
E* positive dehnite by Lemma IB . 21 below so long as p < 2. Setting 1 < p < 2, it follows 
that uniform integrability holds for (fT6l) so that Prb(£*i Z,*,w) is continuous, thereby 
giving the result. □ 


B.2 Strong Instrument Asymptotics 


Let E) and W be weighting matrices that converge in probability to some pos¬ 

itive dehnite symmetric matrix W. Let 

w cmmM ) ’ 

where e* is the ith standard basis vector in M. k , and let 

n _ &WL 

Pgmm,w~ 

The following theorem shows that P GMM \y and $rb(VT^, E, u)q MM ) are asymptot¬ 
ically equivalent in the strong instrument case. For the case where lb(^y E) — W — 
Z'Z/T , this gives asymptotic equivalence to 2SLS. 


Theorem B.2. Let E) and W be weighting matrices that converge in probability 

to the same positive definite matrix W, such that w* GMMi defined above is uniformly 
bounded over Then, under lildf), [Iff) and 1731 ) . 

Vt (p RB (Vf£, E, uj GM m) ~ A 


GMM,W 


4o. 
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Proof. As with the normal case, define the oracle linear combination of 2SLS estimators 


@2SLS — ^2 W i J 2 - 
i =1 S2,j 


where w* = - 0,00 . We have VT [Pr B (VT£, t,w* GMM ) - @q MM W ^ 
where I = VT0 RB (Vri, E, w* GMM ) -f3 RB (VT£, E, w*)), II = VT0 RB (VTf,t,w*)- 

02 SLs) an( i m = 0T02SLS — 0GMM,w)- 
For the first term, note that 

k 

i 


=i+n+i a 


i = Sr E (l (6) )-<) Pu(VT^ a \i),±(i)) 

1=1 

k 

= ^E E± \ (wqmm,! (+) - <) (Pu(VT+(i), £(<))- p 


i= 1 


where the last equality follows since ^, fc =1 WGtfMi(^) = Y0l= i w t = 1 with probability 
one. Thus, by Holder’s inequality, 

k 


i= 1 


S ™GMM,i (| (6) )-< 




1/9 




&(Vf| (a) (i),S(i))-)9 




i/p 


for any p and q with p, q > 1 and 1/p + l/q = 1 such that these conditional expectations 
exist. Under na. w * so, since w gmmM {b) ) is uniformly bounded, 


w, 


GMM,i 


(+) - 


w; 


will converge to zero for any q. Thus, for this term, it 


suffices to bound 


Vf (e £ \ +(VT+\i),£(i)) - p 


1/p 


VfR[Vfl(i),t(i) : P,p) 

1/2 


\ i/p 


Vtr ( VrfeG) - P+i)) ( Eii(i) E 12 G ) 2 

\ yG 22 1 i fH/U) 


1/p 


E 2 2(/) U 2 2(/) 


s 22 i 


for /?. and /?. as defined in Section IB. 31 below. By Lemma IB. 31 below, this is equal to 
^/s 22 (i)/| 20) times a O p ( 1) term for p < 2. Since \J f/220) / f 2 {i) A \fE* 22 {i)/n*, it 
follows that the above display is also 0p(l). Thus, / A 0. 
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For the second term, we have 




J7 = v / Tj>* ft(v / T|(i),E(i)) 


i— 1 \ 


6(0 

6(0 


(Vr£ 2 (i)T(Vrt 2 (i),£ 22 (i)) - 1 ) 


For eaa 
2.3.4 of 


LL 


2=1 

g'i(0 _ 

-S M (i) 


6(0 S 12 (i) 

, 6(0 ^22(0, 

converges in probability to a finite constant and, by Section 


Small (2010), 


Vf Vfi 2 (i)f(Vfi 2 (i), E22W) -1 < 4 0, 

^6(0 

The third term converges in probability to zero by standard arguments. We have 


in = vfj2\w; 


i= 1 V 


6^66] 6y 

&wL ) 6 ,x 




2=1 \ 


6^66] f4i 

6^6 / U 2 ,i 


P , 


where the last equality follows since w* = ^(A with probability one. 

The result then follows from Slutsky’s theorem. 


□ 


B.3 Auxiliary Lemmas 

For p > 1, x G M 2 , 12 a 2 x 2 matrix and & 6 1, let 


R(x , 12, b, p) = 212) — b cj)(zi)(j)(z 2 ) dz\dz\ 


and let 


i?(t,ci,c 2 ,c 3 ,p) = \f(t + z 2 ,2)(ci + c 2 ^i) + [f(t + z 2 , 2)t — 1] c 3 | p (j)(zi)(j)(z 2 ) dzxdz 2 . 




Lemma B.l. For R and R defined above, 

x 2 Xi — bx 2 f 12n 


R(x, 12, b,p) = R 


a/1222 \/12 22 \ 12 


4 22 


^12 

^i 2 


2 \ 1/2 


, ^12 

"-tv p 
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Proof. Without loss of generality, we can let fi 1 / 2 be the upper diagonal square root 
matrix 

0 

Then 


fi x / 2 = 


n;q 1/2 

f2l2 

22 J 

V^22 

) 

V^22 


j3u(x + fi 1//2 £, 2fi) 

( ( fi2 \ 1/2 

— X(X2 + fi22^2, 2fi22) ' ( X\ + (fill — ) Z l + 

T(X2 /\/fi22 + ^2; 2) 


fii2 fii2 ( . rF \— A 1 , fii2 

tst' 2 ~arA X2+ ) + sr 


22 


Vfi 


22 


Q 2 \ 1/2 

X! + ( fin - ^ 


fil2 \ . fil2 

Zl -Tt7P PaTi 


so that 


j3u(x + fi 1//2 ^, 2fi) — b = 


f(x2/y/^22 + ^2; 2) 


\/fi 


22 


\ 1/2 iii2 

+ "'“sfe 12 


12 


fi 

fi22 


+ *4,2) L _ ^ + / nu _ glaV'' 2 Zi 


Vfi22 

f(x2/V^22 + ^ 2 ) 2 ) 


Vfi 


22 


fi22 ) 

^y /2 

fi22 / 


7 fil2 

'’“sW* 2 


fi 


• [ Xi — X2b + ( fill — ' 2 ) Z\ I + I —- - - - , - -X2 — 1 


12 
fi22 

f(x2/ \/fi22 + ^2, 2) 


Vfi 


22 


and the result follows by plugging this in to the definition of R. 


□ 




We now give bounds on R and R. By the triangle inequality, 


R(t,a,C 2 ,c 3 ,p) 1/p < 



f(t + z 2 , 2) p |ci + c 2 £i| p 0 (Ti) 0 (z 2 ) dzidz 2 


i /p 



C 3 1 II \f(t +Z 2 ,2)t-l\ r (j)(z 1 )(j)(z 2 )dz 1 dz 2 
= [Ci(t,p) ■C 2 (c 1 ,c 2 ,p)] 1/p + c 3 C 3 (t,p) 1/p 


i/p 


(17) 


where Ci(t,p) = f r(t + z, 2 ) p 4>(z) dz, C 2 (ci, c 2 ,p) = / \c± + c 2 ^| p </>(^) dz and C 3 (t,p) = 
f \r(t + z, 2)t — 1| p f>(z) dz. Note that, by the triangle inequality, for t > 0, 

C 1 (t,p) 1/P < (^J \t( t + z, 2) - l/t\ p (j)(z) dz^j +l/t = (1/t) [C 3 (t,p) 1/p + l\ . (18) 
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Similarly, 


J f(t + z, 2) p <f{z) dz 

Lemma B.2. For p < 2, Ci(t,p) is bounded uniformly overt on any compact set, and 
R(t,Ci,C 2 ,c 3 ,p) is bounded uniformly over (i, ci,c 2 ,c 3 ) in any compact set. 


C 3 (t,p) 1/P < 1 + t 


i ip 


= 1 + tC 1 (t,p) 1,p . 


(19) 


Proof. We have 


Ci(t,p) — f(t + z,2) p (j)(z)dz = 


< 2~ p/2 


f(z) 


&((t + z)/y/2)P 


dz < K 


1 l-$((t + z)A/2) 
V2 <j>((t + z)/V 2) 


v 

<f(z) dz 



dz 


for a constant K that depends only on p. This is bounded uniformly over t in any 
compact set so long as p/4 < 1/2, giving the first result. Boundedness of R follows 
from this, (fT9l) and boundedness of C 2 (ci,c 2 ,p) over ci,c 2 in any compact set. □ 


Lemma B.3. For p < 2, tR(t, C\, c 2 , C 3 ,p) 1//p is bounded uniformly over t, ci, c 2 , c 3 in 
any set such that t is bounded from below away from zero and C\, c 2 and c 3 are bounded. 


Proof. By (fT71) and f|T8j) . it suffices to bound tC 3 (t,p) 1//p = t (f \f(t + z, 2)t — l\ p <f(z) dz) 1 ^. 
Let £ > 0 be a constant to be determined later. We split the integral into the regions 
t + z < et and t + z > et. We have 


| f(t + z, 2)t — 1| p f(z) dz = 




' t-\-z<et 




' t-\-z<et 


V2f ((t + z)/V 2) 

t [l - $ (ft + z)/V 2)] - V2f ((t + z)/V 2) 


f>(z) dz 
(j){z) 


[V2f((t + z)/V2)]‘ 


dz. 


( 20 ) 
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This is bounded by a constant times 


max{£, 1} / exp I — -z 2 + j (f + z) 2 ) dz 
Jt+z<et \ 2 4 ) 

= max{f, 1} f exp \—\z 2 + - (z 2 + 2tz + f 2 ) ) dz 
Jt+z<et V 2 4 / 

= max{t 1} [ exp f—- (z 2 {l — p/2) — t 2 (p/2) — ptz ) j dz 
Jt+Z<£t \ 2 / 


= max{£, 1} 



= max{£, 1} exp 


We have 


' t-\-z<et 


exp | dz 


]dz 

1 z—tp/(2— p)<(e—1— p/(2— p))t \ 2 \ 2 p 


! u<(e-l-p/(2-p))t 


exp 


l-p/2 2 


2 

u z dz , 


which is bounded by a constant times 


$ (t(£ - 1 - p /{2 - p))</L -P/ 2 ) • 


For t/e—l—p/ (2—p)) < 0, this is bounded by a constant times exp 1 ^ 2 t 2 (l + p/(2 — p) — e) 2 

Thus, (12U1) is bounded nniformly over t > 0 by a constant times exp(— r/t 2 ) for some 
rj > 0 so long as 


1 + 


P 


e i 2 > ^_p_V=_p_ r i+ p 


2 —p J 2 — p \2 — p J 2 — p\ 2 — p, 

which can be ensured by choosing £ > 0 small enough so long as p < 2. Thus, £ > 0 
can be chosen so that (1201) is bounded uniformly over t when scaled by t p . 
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For the integral over t + z > et, we have, by (1.1) in Baricz ( 20081 ). 


/ | tf(t + z, 2) — l| p (j)(z) dz = t p / 

' t+z>et J t-\-z>et 


f(t + z, 2) - 


( j)(z ) dz 


<t p / 

«/ 

The hrst term is 


1 1 


t + z t 


4>(z) dz + t p 


' t-\-z>et 


t p [ 

z 

p i r 

d>(z) dz < — / 

z 

J t-\-z>et 

(t + z)t 

y 1 tp J 

£ 


(t + z) + 2 /{t + z) t 


4>{z) dz. 


4>{z) dz. 


The second term is 
t p [ 


' t-\-z>et 


_ s _ 2/(i + s ) r iKz)dz ^l 


[(t + z) + 2 / (t + z)]t 


tv 


|*| + |2/ei| 


0(,z) dz. 


Both are bounded uniformly when scaled by t p over any set with t bounded from below 
away from zero. □ 


C Relation to 


Hirano & Porter 


( 2015 ) 


Hirano & Porter (120151 ) give a negative result establishing the impossibility of unbiased, 
quantile unbiased, or translation equivariant estimation in a wide variety of models with 
singularities, including many linear IV models. On initial inspection_our derivation of 


an unbiased estimator for /3 may appear to contradict the resul 


fact, however, one of the key assumptions of 


Hirano fe Porter 


s of 


Hirano fe Porter. In 


(1201511 no longer applies 


once we assume that the sign of the first stage is known. 

Again consider the linear IV model with a single instrum ent, w here for simplicity 


we let a\ — a\ — 1, a 12 = 0. To discuss the results of IHirano & Porte r (120151 ). it 


will be helpful to parameterize the model in terms of the reduced-form parameters 
("0,7r) = (7T/3,7r). For 0 again the standard normal density, the density of £ is 

/ (f; *!>, tt) = 0 (6 - VO 0 (6 - tt). 

Fix some value 0*. For any 7r ^ 0 we can define /3(0,7r) = V If we consider any 
sequence {7r J }°h 1 approaching zero from the right, then /3(0*,7 Tj) —* 00 if 0* > 0 and 
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0(il)*, n.j) —y —oo if ^* < 0. Thus we can see that (3 plays the role of the function k in 


Hi rano fc Porter! (120151) eq uation (2.1). 


Hirano & P orter (120151) show that if there exists some finite collection of parameter 


values (ifii d, 7Ti t d) in the parameter space and non-negative constants q^ such that their 
Assumption 2.4, 

S 

f (6 U, o) < c m/ (6 Ad, *i,d) Vf, 

i=i 

holds, then (since one can easily verify their Assumption 2.3 in the present context) 
there can exist no unbiased estimator of (3. 

This dominance condition fails in the linear IV model with a sign restriction. For 
any ( 1 P 14 , ^i,d) in the parameter space, we have by definition that ir^d > 0. For any such 
7Ti t d, however, if we fix and take £2 —> — 00 , 


lim 

^2->-00 


0 (6 ~ 7h,d) 

0 ( 6 ) 


^2^—00 


= lim exp ( -- (£2 - vr M ) 2 + -£f I = lim exp ( ^14 ~ ~rf d ) = 0. 


£2—^—00 


Thus, limga-^-oo = 0, and for any hxed &, {q, d }^ =1 and {{^ 14 , Hd)}i=i 


fWfl) 

there exists a ^ such that £2 < £2 implies 


/ (6 U, 0) > c w/ (^ ; 7T 14 ) ■ 


1=1 


Thus, Assumption 2.4 in 


Hirano &; Porter! (120151) fails in this model, allowing the possi¬ 


bility of an unbiased estimator. Note, however, that if we did not impose n > 0 then we 
would satisfy Assumption 2.4, so unbiased estimation of /3 would again be impossible. 
Thus, the sign restriction on n plays a central role in the construction of the unbiased 
estimator (3jj- 


D Lower Bound on Risk of Unbiased Estimators 

This appendix gives a lower bound on the attainable risk at a given n, (3 for an estimator 
that is unbiased for (3 for all ir, /3 with n in the positive orthant. The bound is given by 
the risk in the submodel where vr/||7r|| (the direction of n) is known. While the bound 


54 




















cannot, in general, be obtained, we discuss some situations where it can, which include 
certain values of tt in the case where £ comes from a model with homoskedastic errors. 


Theorem D.l. Let U be the set of estimators for (3 that are unbiased for all tt E 
(0, oo) k , f3 E M. For any tt* e (0, oo) k , (3* E M and any convex loss function £, 


E n *^£0u{€{ 7T*), E*(tt*)) - (3*) < inf E) - f3*) 

p&u 


where f*( tt*) = [(I 2 ® tt*)' E~ l (I 2 ® tt*)\ 1 (I 2 <g> tt*)' E" x £ and E*(n*) = [(I 2 ® tt*)' E _1 (J 2 <g) 7r*)] 

Proof. Consider the submodel with tt restricted to II* = {n*t\t E (0, oo)}. Then £,*(tt*) 
is sufficient for (t,/3) in this submodel, and satisfies £*(7 t*) ~ Ef((/3t,t)', E*(7T*)) in this 
submodel. To see this, note that, for t,/3 in this submodel, £ follows the generalized 
least squares regression model £ = (I 2 ® Tr*)((3t,t)' + e where £ ~ N( 0, E), and £*(vr*) 
is the generalized least squares estimator of ({3t, t)'. 

Let (3(£,(tt*), E (tt* )) be a (possibly randomized) estimator based on £( 7 r*) that is un- 


-1 


biased in the submodel where 7r G II*. By completeness of the submodel, E 
/?c/(£(7r*), E(7 t*)). By Jensen’s inequality, therefore, 


/3(£(7r*),E(7r*))|£*(7r*) 


E^rt (m^)M^))~f3)>E^U (E /3(£(tt*), E(7t*))|£*(/3) 


-/? 


(this is just Rao-Blackwell applied to the submodel with the loss function i). By 
sufficiency, the set of risk functions for randomized unbiased estimators based on £(7 t*) in 
the submodel is the same as the set of risk functions for randomized unbiased estimators 
based on £ in the submodel. This gives the result with U replaced by the set of 
estimators that are unbiased in the submodel, which implies the result as stated, since 
the set of estimator which are unbiased in the full model is a subset of those which are 
unbiased in the submodel. □ 


Theorem ID . 1 1 continues to hold in the case where the lower bound is infinite: in this 
case, the risk of any unbiased estimator must be infinite at (3*,tt*. By Theorem 12.21 the 
lower bound is infinite for squared error loss £(t) = t 2 for any tt*,(3*. Thus, unbiased 
estimators must have infinite variance even in models with multiple instruments. 
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While in general Theorem ID.II gives only a lower bound on the risk of unbiased 
estimators, the bound can be achieved in certain situations. A case of particular interest 
arises in models with homoskedastic reduced form errors that are independent across 
observations. In such cases Var ((U', V')') = Var ((U\, V \where It is the T xT 
identity matrix, so that the definition of £ in (j3J) gives £ = Var ((U\, V\)') ® ( Z'Z ) _1 . 
Thus, in models with independent homoskedastic errors we have £ = Quv ® Qz for a 
2x2 matrix Quv and a k x k matrix Qz- 

Theorem D.2. Suppose that [(J 2 ® n*)' £ _1 (J 2 <g) 7r*)] 1 (J 2 <S) n*)' £ _1 = (/ 2 <8) a(7r*)') 
for some a(n*) G R fc . Then fiu(C*(7 t*),£(7t*)) defined in Theorem ] D.l\ is unbiased at any 
it, such that a(7r*)V > 0. In particular, ifa(ix *) G (0, oo) fc , then fiu(C*(7 t*),£(7t*)) £ W 
and the risk bound is attained. Specializing to the case where £ = Quv ® Qz for a 
2x2 matrix Quv an d a k x k matrix Qz, the above conditions hold with a(n*)' = 
n *'Qz 1 / X 71 *'Qz 17r *) > an d bound is achieved if Q^tx* G (0,oo) fc . 

Proof. For the first claim, note that under these assumptions £*( 7 r*) = (a(7T*)'£i, a(7T*) , £ 2 ) / 
is N((a(n*)'irfi, a(7r*)'7r)', £*(7 t)) distributed under 7 r,fi, so A/(£*(7 t*), £(tt*)) is unbi¬ 
ased at it, fi by Theorem 12.11 For the case where £ = Quv ® Qz, the result follows by 
properties of the Kronecker product: 

[(I 2 <E) 7r*) / (Quv ® Qz) {I 2 ® 7T*)] (1 2 ® tt*)' (Quv ® Qz) 

= [Quv ® _1 (Qc/v <8 = h ® [n*'Q~zt (**'Qz l?r *)] • 

□ 

The special form of the sufficient statistic in the homoskedastic case derives from the 
form of the optimal estimator in the restricted seemingly unrelated regression (SUR) 
model. The submodel for the direction n* is given by the SUR model 



Considering this as a SUR model with regressors Zn* in both equations, the optimal 
estimator of (fit, t)' simply stacks the OLS estimator for the two equations, since the 
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regressors Zir* are the same and the parameter space for (/ 3t,t ) is unrestricted. Note 
also that, in the homoskedastic case (with Qz = ( Z'Z ) _1 ), ^i(vr*) and Qi^*) are P ro_ 
portional to n*'Z'Z£i and ix*'Z'Z^, which are the numerator and denominator of the 
2SLS estimator with £ 2 replaced by n* in the first part of the quadratic form. 

Thus, for certain parameter values n* in the homoskedastic case, the risk bound 
in Theorem ID.II is obtained. In such cases, the estimator that obtains the bound is 
unique, and depends on n* itself (for the absolute value loss function, which is not 
strictly concave, uniqueness is shown in Section ID.II below). Thus, in contrast to 
settings such as linear regression, where a single estimator minimizes the risk over 
unbiased estimators simultaneously for all parameter values, no uniform minimum risk 
unbiased estimator will exist. The reason for this is clear: knowledge of the direction 
of 7r = 7T* helps with estimation of /?, even if one imposes unbiasedness for all n. 

It is interesting to note precisely how the parameter space over which the estimator 
in the risk bound is unbiased depends on n*. Suppose one wants an estimator that 
minimizes the risk at 7 r* while still remaining unbiased in a small neighborhood of 7r*. 
In the homoskedastic case, this can always be done so long as n* G (0, oo) k , since 
> 0 for 7 r close enough to n*. Where one can expand this neighborhood while 
maintaining unbiasedness will depend on n* and Qz- In the case where 7is in the 
positive orthant, the assumption 7r G (0, oo) k is enough to ensure that this estimator 
is unbiased at ir. However, if 7T*' Q is not in the positive orthant, there is a tradeoff 
between precision at n* and the range of 7r G (0, oo) k over which unbiasedness can be 
maintained. 

Put another way, in the homoskedastic case, for any n* G M fc \{0}, minimizing the 
risk of an estimator of /3 subject to the restriction that the estimator is unbiased in a 
neighborhood of tt* leads to an estimator that does not depend on this neighborhood, 
so long as the neighborhood is small enough (this is true even if the restriction n* G 
(0, oo) k does not hold). The resulting estimator depends on 7r*, and is unbiased at 7r iff 

7T*Q^ 1 7T > 0. 
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D.l Uniqueness of the Minimum Risk Unbiased Estimator un¬ 


der Absolute Value Loss 


In the discussion above, we used the result that the minimum risk unbiased estimator in 
the submodel with 7r/||7r|| known is unique for absolute value loss. Because the absolute 
value loss function is not strictly concave, this result does not, to our knowledge, follow 
immediately from results in the literature. We therefore provide a statement and proof 
here. In the following theorem, we consider a general setup where a random variable £ is 
observed, which follows a distribution for some /i G M. The family of distributions 
{Pfj,\n G M} need not be a multivariate normal family, as in the rest of this paper. 

Theorem D.3. Let 9 = 9(f) be an unbiased estimator of 6 = 9(p) where /a G M for 
some parameter space M and 0 = {9\9(p) = 6 some /i G M} C M, and where £ has the 
same support for all p G M. Let 9(f, U) be another unbiased estimator, based on (£, U) 
where £ and U are independent and 9(f) = E /Jj [9(f, U) |£] = f 9(f,U)dQ(U) where Q 
denotes the probability measure of U, which is assumed not to depend on p. Suppose 
that 9(f) and 9(f, U) have the same risk under absolute value loss: 

E„\e(f, U) - 9(p) I = E,\9(f) - 9(p) I for all /i G M. 

Then 9(f, U) = 9(f) for almost every f with 9(f) G 0. 


Proof. The display can be written as 


E f ^ Ui 


|0(£, U) - 9(p) I £ - |(9(£) - 9(p) , = 0 for all p G M. 


By Jensen’s inequality, the term inside the outer expectation is nonnegative for /^-almost 
every £. Thus, the equality implies that this term is zero for ^-almost every £ (since 
EX = 0 implies X = 0 a.e. for any nonnegative random variable X). This gives, noting 


that / |0(£, U) - 9(p )| dQ(U) = Ef \9(f, U) - 9(p)\ 


1 9(f, U) — 9(p )| dQ(U) = |0(£) — 9(p)\ for /i-almost every £ and all /iGtf. 
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Since the support of £ is the same under all p, G M, the above statement gives 

I |0(£, U) — 9 \ dQ(U ) = |0(£) — 9\ for almost every £ and all 9 G 0. 

Note that, for any random variable X, E\X\ = \EX\ implies that either X > 0 a.e. 
or X < 0 a.e. Applying this to the above display, it follows that for all 6 G 0 and 
almost every £, either 0(£, U) < 9 a.e. U or 0(£, U) >6 a.e. U. In particular, whenever 
0(£) G 0, either 0(£, U) < 0(£) a.e. U or 0(£, [/) > 0(£) a.e. U. In either case, the 
condition f 9(£,U) dQ(U) = 0(£) implies that 0(£, U) = #(£) a.e. I/. It follows that, 
for almost every £ such that 0(£) G 0, we have 0(£, U) = 6*(£) a.e. U, as claimed. □ 

Thus, if 0(£) G 0 with probability one, we will have 0(£, U) = 0(£) a.e. (£, U). 
However, if 0(£) can take values outside 0 this will not necessarily be the case. For 
example, in the single instrument case of our setup, if we restrict our parameter space 
to (7r, /3) G (0, oo) x [c, oo) for some constant c, then forming a new estimator by adding 
or subtracting 1 from f3u with equal probability independently of £ whenever /3[/ < c — 1 
gives an unbiased estimator with identical absolute value risk. 

In our case, letting £(vr*) be as Theorem lD.il the support of £(7 t*) is the same under 
7 r*t, /3 for any t G (0, oo) and /3 G M. If /?(£(7r*), 17) is unbiased in this restricted pa¬ 
rameter space, we must have, letting Ay(£*(7r), £*(7t)) be the unbiased nonrandomized 
estimator in the submodel, £'[^(£(7r*), t/)|£(7r*)] = A/(£(7 t*), £*(7t)) by completeness 
for any random variable U with a distribution that does not depend on ( t, (3 ). Since 
A/(£(tt*), £*(7 t)) G R with probability one, it follows that if /3(£(7T*), U) has the same 
risk as A/(£(7 t*), £*(7t)) then /3(£(7r*), [/) = /df/(£(7r*), £*(7t)) with probability one, so 
long as we impose that /3(£(7 t*), U ) is unbiased for all t G (1 — e, 1 + e) and /? G R. 

E Reduction of the Parameter Space by Equivariance 

In the appendix, we discuss how we can reduce the dimension of the parameter space 
using an equivariance argument. We first consider the just-identified case and then note 
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how such arguments may be extended to the over-identified case under the additional 
assumption of homoskedasticity. 


E.l Just-Identified Model 


For comparisons between \J3jj, P 2 SLS 1 PfullJ hi the just-identified case, it suffices to 
consider a two-dimensional parameter space. To see that this is the case let 6 = 

CL\ &2 


9aZ = £, = a 


(/?, 7 r, a\, (J 12 , erf) be the vector of model parameters and let ()a- for A = 
a\ 7 ^ 0 , 03 > 0 , be the transformation 

6 \ _ f a l£l + a 2^2 

£2 J y a 3 ^2 

which leads to £ being distributed according to the parameters 


where 


0 a 3 


0 = /3, tt,^, 5-i 2 , 5- 2 


P = 


(ai/3 + a^) 


and 


a 3 

7 r = a 3 TT 


a\ = a\a\ + aia 2 cri 2 + d 2 a 2 


< 712 — didder 12 + a 2 a 3 a 2 


~ 2 2 2 
a 2 = d 3 a 2 . 


Define Q as the set of all transformations qa of the form above. Note that the sign 
restriction on n is preserved under ()a G G, and that for each gA, there exists another 
transformation g jj 1 G G such that gAg~A * s the identity transformation. We can see 
that the model (J2J) is invariant under the transformation gA ■ Note further that the 
estimators flu, 02 sls, and Pfull are all equivariant under gA, in the sense that 

a ifi (0 + a 2 


P (9a0 = 


d 3 
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Thus, for any properties of these estimators (e.g. relative mean and median bias, relative 
dispersion) which are preserved under the transformations cja- it suffices to study these 
properties on the reduced parameter space obtained by equivariance. By choosing A 
appropriately, we can always obtain 



for 7T > 0, a 12 > 0 and thus reduce to a two-dimensional parameter (7r, a 12 ) with 
cr 12 £ [0, 1), 7T > 0. 


E.2 Over-Identified Model under Homoskedasticity 


As noted in Appendix [DJ under the assumption of iid homoskedastic errors £ is of the 
form £ = Quv <2) Qz for matrix Quv = V ar({U\,V\)') and Qz = {Z'Z)~ l . If we let 
afj = Var(Ui), ay = Var(V 1), and auv — Cov(Ui,Vi), then using an equivariance 
argument as above we can eliminate the parameters cr^, cry, and /3 for the purposes 
of comparing /3 2 sls, Pfull, and the unbiased estimators. In particular, define 9 = 


(/3,7r, <rfj, auv, &v> Qz) and again let A = 
transformation 


a 1 
0 


Q>2 


, a± 7^ 0, 03 > 0 and consider the 


9a£ — £ — (A <S> h) 




which leads to £ being distributed according to the parameters 


6 = 



where 

~ _ (aiP + a 2 ) 

a 3 

7f = a 3 7r 


a^j — ci\afj + a\a 2 auv + n 2 cr 


2 

v 


auv — a i a 3 auv + o, 2 a 3 ay 
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~ 2 2 2 
Cfy — CZ3 <7y 


and 

Qz — Qz- 

Note that this transformation changes neither the direction of the fist stage, tt/||tt|| , 
nor Qz- If we again define Q to be the class of such transformations, we again see 
that the model is invariant under transformations qa G f ?, and that the estimators for 
0 we consider are equivariant under these transformations. Thus, since relative bias 
and MAD across estimators are preserved under these transformations, we can again 
study these properties on the reduced parameter space obtained by equivariance. In 
particular, by choosing A appropriately we can set = &y = 1 and 0 = 0, so the 
remaining free parameters are n, djjvi and Qz- 

F Additional Simulation Results in Just-Identified Case 

This appendix gives further results for our simulations in the just-identified case. We 
first report median bias comparisons for the estimators j3u, 02 sls, and 0full , and then 
report further dispersion and absolute deviation simulation results to complement those 
in Section 14.1.21 of the paper. 

F.l Median Bias 

Figure [3] plots the median bias of the single-instrument IV estimators against the mean 
of the first stage F statistic. In all calibrations considered the unbiased estimator has a 
smaller median bias than 2SLS when the first stage is very small and a larger median 
bias for larger values of the first stage. By contrast the median bias of Fuller is larger 
than that of both the unbiased and 2SLS estimators, though its median bias is quite 
close to that of the unbiased estimator once the mean of the first stage F statistic 
exceeds 10. 
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Figure 3: Median bias of single-instrument estimators, plotted against mean E [F] of first-stage F- 
statistic, based on 10 million simulations. 

F.2 Dispersion Simulation Results 

The lack of moments for fosLS complicates comparisons of dispersion, since we cannot 
consider mean squared error or mean absolute deviation, and also cannot recenter @ 2 SLS 
around its mean. As an alternative, we instead consider the full distribution of the 
absolute deviation of each estimator from its median. In particular, for the estimators 
(A/, $ 2 SLS, Pfull) we calculate the zero-median residuals 

(£[/, £2 SLSi £full) = — med (flu') , $2 Sls ~ me d (Asls) , Pfull — med (Pfull)^) ■ 

Our simulation results suggest a strong stochastic ordering between these residuals 
(in absolute value). In particular we find that | £ 2 Sls\ approximately dominates \ejj\i 
which in turn approximately dominates \£full\, both in the sense of first order stochas¬ 
tic dominance. This numerical result is consistent with analytical results on the tail 
behavior of the estimators. In particular, /3 2 sls h as 110 moments, reflecting thick tails in 
its sampling distribution, while Pfull has all moments, reflecting thin tails. As noted 
in Section 12731 the unbiased estimator j3u has a first moment but no more, and so falls 
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between these two extremes. 

To check for stochastic dominance in the distribution of (\su\, |^ 2 SLs|, \sfull\)i we 
simulated 10 6 draws of f3jj, P 2 SLS, and (3full on a grid formed by the Cartesian product 
of 

< 7 12 G {o, (0.005)5, (0.01)3,.(0.995)2 } a nd n e {(0.01) 2 , (0.02) 2 ,25} . We use 
these grids for cri 2 and 7r, rather than a uniformly spaced grid, because preliminary 
simulations suggested that the behavior of the estimators was particularly sensitive to 
the parameters for large values of 0 \i and small values of 7r. 

At each point in the grid we calculate (£u,£ 2 Sls,£full), using independent draws 
to calculate £u and the other two estimators, and compute a one-sided Kolmogorov- 
Smirnov statistic for the hypotheses that (i) \eiv\ > \£u\ and (ii) \eu\ > \£full\, where 
A > B for random variables A and B denotes that A is larger than B in the sense of 
first-order stochastic dominance. In both cases the maximal value of the Kolmogorov- 
Smirnov statistic is less than 2 x 10~ 3 . Conventional Kolmogorov-Smirnov p-values are 
not valid in the present context (since we use estimated medians to construct e), but 
are never below 0.25. 

F.3 Containment of fiu in Anderson-Rubin Confidence Set 

As noted in Section 12.41 the Anderson-Rubin test is uniformly most powerful unbiased 
in the just identified model. One can show, however, that the unbiased estimator f3jj 
is not always contained in the Anderson-Rubin confidence set (that is, the confidence 
set formed by collecting the set of all parameter values not rejected by the Anderson- 
Rubin test). Specifically, consider the case where £ 2 is large and negative, 0 is large 
and positive, and cr 12 is non-negative. In this case, the Anderson-Rubin confidence set 
will consist solely of negative values, while flu will be large and positive, and so will 
necessarily lie outside the Anderson-Rubin confidence set. 

While this sort of scenario can easily arise if our sign constraint is violated, it 
occurs with only low probability when the sign constraint is satisfied. In particular, as 
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in Section IF. 21 we consider a fine grid of values in the parameter space and simulate 
the frequency with which the unbiased estimator is contained in the Anderson-Rubin 
confidence set at each point (based on 100,000 simulations). We find that the probability 
that the 95% Anderson-Rubin confidence set contains the unbiased estimator /%/ is 
always at least 97%, and exceeds 99.8% when the mean of the first stage F statistic is 
greater than two. Likewise, the probability that the 90% Anderson-Rubin confidence 
set contains /3u is always at least 94.5%, and exceeds 99.3% when the mean of the first 
stage F statistic is greater than two. 


G Multi-Instrument Simulation Design 


This appendix gives further details for the multi-instrument simulation design used in 
Sec tion 14.21 We base o u r sim ulations on the Staieer &; Stock specifications for 


the 


An grist fc Kruege r (119911 ) data. The instruments in all specifications are quar¬ 


ter of birth and quarter of birth interacted with other dummy variables, and in all 
cases the dummy for the fourth quarter (and the corresponding interactions) are ex¬ 


cluded to avoid mul 


icollinearity. The rationale for the quarter of birth instrument in 


Angrist &_Krueger ( 199 1) indicates that the first stage coefficients on the instruments 


should therefore be negative. 

We first calculate the OLS estimates n. All estimated coefficients satisfy the sign 
restriction in specification 1, but some of them violate it in specifications II, III, and 
IV. To enforce the sign restriction, we calculate the posterior mean for n conditional on 
the OLS estimates, assuming a flat prior on the negative orthant and an exact normal 
distribution for the OLS estimates with variance equal to the estimated variance. This 
yields an estimate 


i Oi^i 


/ 1-4 


for the first-stage coefficient on instrument i, where 7q is the OLS estimate and is 
its standard error. When 7q is highly negative relative to <%, i q will be close to iq, 
but otherwise i q ensures that our first stage estimates all obey the sign constraint. We 
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then conduct the simulations using n* = —n to cast the sign constraint in the form 
considered in Section IT721 


Our simulations fix 7r*/||7T*|| its estimated value and fix Z'Z at its value in the 


data. By the equivariance argument in Appendix [E] we can fix afj = cr\ = 1 and (3 = 0 
in our simulations, so the only remaining free parameters are ||7r|| and cruv- We consider 
cruv £ {0.1,0.5,0.95} and consider a grid of nine values for ||7r|| such that the mean of 
the first stage F statistic varies between 2 and 11.2. For each pair of these parameters 
we set 



and draw of £ as 
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